8 D&H Ch8 - Regressor Importance: “politics”

Darlington & Hayes, Chapter 8’s example

# install.packages("remotes")
# remotes::install_github("sarbearschwartz/apaSupp")
# remotes::install_github("ddsjoberg/gtsummary")

library(tidyverse) 
library(haven)
library(flextable)
library(apaSupp)
library(car)
library(rempsyc)
library(parameters)
library(performance)
library(interactions)
library(ggResidpanel)
library(relaimpo)
library(dominanceanalysis)
library(domir)
flextable::set_flextable_defaults(digits = 2)

8.1 PURPOSE

RESEARCH QUESTION:

RQ1) Is news consumption via national news broadcast and newspaper correlated with knowledge of the political process, after accounting for sex and age?

RQ2) If so, do both sources of news have the same effect on political knowledge?

RQ3) Is listening to political talk radio more or less important than watching the national network news broadcast?

8.1.1 Data Import

You can download the politics dataset here:

df_spss <- haven::read_sav("politics.sav")
tibble::glimpse(df_spss)
Rows: 20
Columns: 4
$ mstatus <dbl+lbl> 3, 2, 4, 4, 1, 3, 2, 2, 3, 1, 3, 4, 1, 1, 2, 1, 3, 2, 4, 1
$ satis   <dbl> 85, 80, 72, 60, 92, 88, 74, 84, 88, 82, 76, 78, 78, 93, 73…
$ income  <dbl> 53, 65, 54, 35, 73, 75, 57, 59, 60, 52, 47, 60, 63, 66, 51, 53…
$ sex     <dbl+lbl> 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1
summary(df_spss)
    mstatus         satis           income           sex      
 Min.   :1.00   Min.   :60.00   Min.   :35.00   Min.   :0.00  
 1st Qu.:1.00   1st Qu.:75.75   1st Qu.:52.75   1st Qu.:0.00  
 Median :2.00   Median :80.00   Median :56.00   Median :0.00  
 Mean   :2.35   Mean   :80.35   Mean   :56.90   Mean   :0.45  
 3rd Qu.:3.00   3rd Qu.:85.75   3rd Qu.:61.50   3rd Qu.:1.00  
 Max.   :4.00   Max.   :93.00   Max.   :75.00   Max.   :1.00  

When importing data from SPSS (.sav), you need to be careful how categorical vairables are stored.

df_fac <- df_spss %>% 
  haven::as_factor()
tibble::glimpse(df_fac)
Rows: 20
Columns: 4
$ mstatus <fct> single, divorced, widowed, widowed, married, single, divorced,…
$ satis   <dbl> 85, 80, 72, 60, 92, 88, 74, 84, 88, 82, 76, 78, 78, 93, 73, 80…
$ income  <dbl> 53, 65, 54, 35, 73, 75, 57, 59, 60, 52, 47, 60, 63, 66, 51, 53…
$ sex     <fct> female, male, female, female, male, male, female, female, fema…
summary(df_fac)
     mstatus      satis           income          sex    
 married :6   Min.   :60.00   Min.   :35.00   female:11  
 divorced:5   1st Qu.:75.75   1st Qu.:52.75   male  : 9  
 single  :5   Median :80.00   Median :56.00              
 widowed :4   Mean   :80.35   Mean   :56.90              
              3rd Qu.:85.75   3rd Qu.:61.50              
              Max.   :93.00   Max.   :75.00              

8.1.2 Data Description

National survey of residents of the United States.

Participants were asked a set of questios used to quantify their knowledge of politics, politicians, and the political process.

Assumption: knowledge is caused by exposure to information

8.1.2.1 Variables

Dependent Variable (outcome, Y)

  • pknow knowledge of the political process

Independent Variables (predictors or regressors, X’s)

Frequency of Exposure (days per week)…

  • talkrad listening to political talk radio
  • natnews watch national news broadcasts
  • npnews read newspaper
  • locnews watch local news

Covariates

  • age age in years, 18-90
  • sex sex, Male vs. Female

Categorical variables MUST be declare as FACTORS and the FIRST level listed is treated as the reference category.

df_pol <- df_spss %>% 
  haven::as_factor() %>% 
  haven::zap_label() %>% 
  tibble::rowid_to_column(var = "id") %>% 
  dplyr::mutate(news_sum = (natnews + npnews)/2) %>% 
  dplyr::mutate(news_dif = (natnews - npnews)/2)
tibble::glimpse(df_pol)
Rows: 340
Columns: 19
$ id       <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
$ pknow    <dbl> 8, 12, 11, 13, 14, 8, 10, 15, 9, 6, 8, 9, 10, 2, 9, 6, 4, 15,…
$ age      <dbl> 35, 40, 43, 26, 41, 41, 18, 31, 18, 72, 43, 43, 63, 32, 30, 2…
$ educ     <dbl> 13, 14, 14, 16, 12, 12, 12, 14, 11, 12, 10, 17, 12, 14, 16, 1…
$ sex      <fct> Female, Female, Male, Female, Female, Male, Female, Male, Fem…
$ income   <dbl> 42.5, 42.5, 110.0, 100.0, 57.5, 80.0, 42.5, 90.0, 30.0, 30.0,…
$ polint   <dbl> 2, 2, 2, 3, 2, 2, 2, 3, 3, 2, 2, 3, 3, 2, 3, 2, 1, 3, 2, 2, 3…
$ party    <fct> Democrat, Republican, Republican, Democrat, Republican, Democ…
$ libcon   <dbl> 3, 6, 4, 2, 3, 3, 2, 7, 5, 6, 5, 3, 3, 5, 3, 1, 6, 6, 5, 3, 5…
$ pdiscuss <dbl> 4, 3, 3, 2, 7, 3, 2, 7, 5, 7, 2, 0, 7, 7, 7, 3, 7, 7, 7, 5, 7…
$ natnews  <dbl> 3, 0, 1, 3, 0, 0, 2, 2, 3, 0, 3, 4, 7, 3, 1, 0, 3, 0, 2, 0, 5…
$ npnews   <dbl> 1, 4, 2, 5, 0, 7, 4, 0, 7, 7, 3, 1, 4, 1, 1, 3, 1, 7, 2, 7, 3…
$ locnews  <dbl> 3.5, 3.5, 3.5, 5.0, 5.0, 3.5, 1.5, 2.0, 7.0, 7.0, 0.5, 4.0, 3…
$ talkrad  <dbl> 4.5, 3.5, 3.5, 1.0, 1.0, 1.0, 1.0, 4.5, 1.0, 1.0, 1.0, 1.0, 2…
$ ses      <dbl> -0.58, -0.34, 0.50, 0.85, -0.63, -0.34, -0.81, 0.25, -1.21, -…
$ news     <dbl> 2.50, 2.50, 2.16, 4.33, 1.66, 3.50, 2.50, 1.33, 5.66, 4.66, 2…
$ demoneg  <dbl> 3.14, 2.85, 2.00, 2.00, 2.00, 2.00, 2.00, 2.57, 1.85, 2.57, 2…
$ news_sum <dbl> 2.0, 2.0, 1.5, 4.0, 0.0, 3.5, 3.0, 1.0, 5.0, 3.5, 3.0, 2.5, 5…
$ news_dif <dbl> 1.0, -2.0, -0.5, -1.0, 0.0, -3.5, -1.0, 1.0, -2.0, -3.5, 0.0,…
summary(df_pol)
       id             pknow            age             educ           sex     
 Min.   :  1.00   Min.   : 0.00   Min.   :18.00   Min.   : 6.00   Female:177  
 1st Qu.: 85.75   1st Qu.: 8.00   1st Qu.:35.00   1st Qu.:12.00   Male  :163  
 Median :170.50   Median :11.00   Median :43.00   Median :14.00               
 Mean   :170.50   Mean   :11.31   Mean   :44.93   Mean   :14.29               
 3rd Qu.:255.25   3rd Qu.:15.00   3rd Qu.:54.25   3rd Qu.:16.00               
 Max.   :340.00   Max.   :21.00   Max.   :90.00   Max.   :17.00               
     income           polint             party         libcon   
 Min.   :  2.50   Min.   :1.000   Democrat  :141   Min.   :1.0  
 1st Qu.: 42.50   1st Qu.:2.000   Republican:146   1st Qu.:3.0  
 Median : 57.50   Median :3.000   Other     : 53   Median :5.0  
 Mean   : 64.43   Mean   :2.818                    Mean   :4.5  
 3rd Qu.: 80.00   3rd Qu.:3.000                    3rd Qu.:6.0  
 Max.   :200.00   Max.   :4.000                    Max.   :7.0  
    pdiscuss        natnews          npnews         locnews     
 Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :0.000  
 1st Qu.:3.000   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000  
 Median :7.000   Median :3.000   Median :3.000   Median :3.000  
 Mean   :4.865   Mean   :3.412   Mean   :3.574   Mean   :2.966  
 3rd Qu.:7.000   3rd Qu.:7.000   3rd Qu.:7.000   3rd Qu.:4.500  
 Max.   :7.000   Max.   :7.000   Max.   :7.000   Max.   :7.000  
    talkrad           ses                  news          demoneg     
 Min.   :1.000   Min.   :-2.6200000   Min.   :0.000   Min.   :1.000  
 1st Qu.:1.000   1st Qu.:-0.5850000   1st Qu.:2.000   1st Qu.:1.830  
 Median :1.000   Median :-0.0750000   Median :3.080   Median :2.140  
 Mean   :2.009   Mean   : 0.0003235   Mean   :3.314   Mean   :2.242  
 3rd Qu.:3.000   3rd Qu.: 0.5000000   3rd Qu.:4.660   3rd Qu.:2.710  
 Max.   :5.000   Max.   : 2.1000000   Max.   :7.000   Max.   :4.000  
    news_sum        news_dif       
 Min.   :0.000   Min.   :-3.50000  
 1st Qu.:1.500   1st Qu.:-1.00000  
 Median :3.500   Median : 0.00000  
 Mean   :3.493   Mean   :-0.08088  
 3rd Qu.:5.000   3rd Qu.: 1.00000  
 Max.   :7.000   Max.   : 3.50000  

8.2 REGRESSION ANALYSIS

8.2.1 RQ1) Fit Main Model

fit_pol_1 <- lm(pknow ~ natnews + npnews + age + sex,
                data = df_pol)
apaSupp::tab_lm(fit_pol_1,
                var_labels = c(natnews = "National News",
                               npnews = "Newspapers",
                               age = "Age",
                               sex = "Sex"),
                d = 3) %>% 
  flextable::width(j = 1, width = 1.5)
Table 8.1
Regression Model

Variable

b

(SE)

p

b*

𝜂²

𝜂ₚ²

(Intercept)

8.795

(0.733)

< .001 ***

National News

0.155

(0.089)

.082

0.094

.008

.009

Newspapers

0.371

(0.084)

< .001 ***

0.239

.049

.055

Age

-0.009

(0.017)

.578

-0.031

.001

.001

Sex

.064

.070

Female

Male

2.245

(0.445)

< .001 ***

0.161

Adjusted R²

0.150

Note. b* = standardized estimate. 𝜂² = semi-partial correlation. 𝜂ₚ² = partial correlation.

* p < .05. ** p < .01. *** p < .001.

8.2.2 RQ2) Parameter Equivalence

fit_pol_2 <- lm(pknow ~ news_sum + age + sex,
                data = df_pol)

fit_pol_3 <- lm(pknow ~ news_sum + news_dif + age + sex,
                data = df_pol)
apaSupp::tab_lms(list(fit_pol_1, fit_pol_2, fit_pol_3),
                 var_labels = c(news_sum = "News, sum",
                                news_dif = "News, dif",
                                age = "Age",
                                sex = "Sex"))
Table 8.2
Compare Regression Models

Model 1

Model 2

Model 3

Variable

b

(SE)

p

b

(SE)

p

b

(SE)

p

(Intercept)

8.79

(0.73)

< .001 ***

8.78

(0.73)

< .001 ***

8.79

(0.73)

< .001 ***

natnews

0.15

(0.09)

.082

npnews

0.37

(0.08)

< .001 ***

Age

-0.01

(0.02)

.578

-0.01

(0.02)

.540

-0.01

(0.02)

.578

Sex

Female

Male

2.25

(0.45)

< .001 ***

2.33

(0.44)

< .001 ***

2.25

(0.45)

< .001 ***

News, sum

0.54

(0.11)

< .001 ***

0.53

(0.11)

< .001 ***

News, dif

-0.22

(0.13)

.097

0.16

0.15

0.16

Adjusted R²

0.15

0.15

0.15

* p < .05. ** p < .01. *** p < .001.

car::linearHypothesis(fit_pol_1, c("natnews - npnews"))
# A tibble: 2 × 6
  Res.Df   RSS    Df `Sum of Sq`     F `Pr(>F)`
   <dbl> <dbl> <dbl>       <dbl> <dbl>    <dbl>
1    336 5487.    NA        NA   NA     NA     
2    335 5442.     1        45.1  2.78   0.0966

8.2.3 RQ3) Variable Contribution

8.2.3.1 Variable Importance

lm(pknow ~ npnews + locnews + talkrad + natnews,
   data = df_pol) %>% 
  caret::varImp()
# A tibble: 4 × 1
  Overall
    <dbl>
1    6.03
2    4.14
3    4.90
4    2.27
lm(pknow ~ npnews + locnews + talkrad + natnews,
   data = df_pol) %>% 
  relaimpo::calc.relimp(type = "lmg", importance = TRUE)
Response variable: pknow 
Total response variance: 19.12264 
Analysis based on 340 observations 

4 Regressors: 
npnews locnews talkrad natnews 
Proportion of variance explained by model: 19.71%
Metrics are not normalized (rela=FALSE). 

Relative importance metrics: 

               lmg
npnews  0.08787767
locnews 0.02812186
talkrad 0.06292808
natnews 0.01819988

Average coefficients for different model sizes: 

                1X        2Xs        3Xs        4Xs
npnews   0.4629151  0.4650344  0.4695145  0.4716805
locnews -0.2108792 -0.3214204 -0.3986934 -0.4440727
talkrad  0.8869822  0.8677825  0.8527704  0.8324193
natnews  0.2444027  0.2313024  0.2191853  0.2086020

8.2.3.2 Relative Improvement in Fit

Set A: * newspaper * localnews

Add: * national news * talk ratio

data.frame(A = c("None",
                 "Newspaper",
                 "Local News",
                 "Both"),
           base = c("pknow ~ 1", 
                    "pknow ~ npnews", 
                    "pknow ~ locnews", 
                    "pknow ~ npnews+locnews")) %>% 
  dplyr::mutate(add_talk = paste0(base, "+talkrad")) %>% 
  dplyr::mutate(add_natn = paste0(base, "+natnews")) %>% 
  dplyr::mutate(fit_base = purrr::map(base,
                                      ~lm(.x, data = df_pol))) %>%
  dplyr::mutate(fit_talk = purrr::map(add_talk,
                                      ~lm(.x, data = df_pol))) %>%
  dplyr::mutate(fit_natn = purrr::map(add_natn,
                                      ~lm(.x, data = df_pol))) %>%
  dplyr::mutate(R2 = purrr::map_dbl(fit_base,
                                    ~ broom::glance(.x)$r.squared)) %>% 
  dplyr::mutate(R2i = purrr::map_dbl(fit_talk,
                                     ~ broom::glance(.x)$r.squared)) %>% 
  dplyr::mutate(R2j = purrr::map_dbl(fit_natn,
                                     ~ broom::glance(.x)$r.squared)) %>% 
  dplyr::mutate(R  = sqrt(R2)) %>% 
  dplyr::mutate(Ri = sqrt(R2i)) %>% 
  dplyr::mutate(Rj = sqrt(R2j)) %>% 
  dplyr::mutate(dRi = Ri - R) %>% 
  dplyr::mutate(dRj = Rj - R) %>% 
  dplyr::select("Set A subset" = A,
                "R" = R, 
                "Adding\nTalk Radio\nR" = Ri,
                "Adding\nNational News\nR" = Rj,
                "Talk Radio\nChange\nin R" = Ri,
                "National News\nChange\nin R" = Rj) %>% 
  flextable::flextable() %>% 
  apaSupp::theme_apa(caption = "Table 8.2 - Relative Improvement in Fit for Dominance Computations",
                     d = 3)
Table 8.3
Table 8.2 - Relative Improvement in Fit for Dominance Computations

Set A subset

R

Adding
Talk Radio
R

Adding
National News
R

Talk Radio
Change
in R

National News
Change
in R

None

0.000

0.261

0.148

0.261

0.148

Newspaper

0.298

0.393

0.310

0.393

0.310

Local News

0.106

0.288

0.237

0.288

0.237

Both

0.338

0.430

0.373

0.430

0.373

8.2.3.3 Dominance Ananlysis

see: https://cran.r-project.org/web/packages/domir/vignettes/domir_basics.html

domir(
  pknow ~ npnews + locnews + talkrad + natnews, 
  function(formula) {
    lm_model <- lm(formula, data = df_pol)
    summary(lm_model)[["r.squared"]]
  }
)
Overall Value:      0.1971275 

General Dominance Values:
        General Dominance Standardized Ranks
npnews         0.08787767   0.44579104     1
locnews        0.02812186   0.14265824     3
talkrad        0.06292808   0.31922529     2
natnews        0.01819988   0.09232542     4

Conditional Dominance Values:
        Include At: 1 Include At: 2 Include At: 3 Include At: 4
npnews     0.08896006    0.08783655    0.08769250    0.08702156
locnews    0.01133509    0.02477780    0.03521980    0.04115476
talkrad    0.06808081    0.06448014    0.06150433    0.05764703
natnews    0.02193340    0.02061423    0.01791688    0.01233499

Complete Dominance Proportions:
          > npnews > locnews > talkrad > natnews
npnews >        NA      1.00         1      1.00
locnews >        0        NA         0      0.75
talkrad >        0      1.00        NA      1.00
natnews >        0      0.25         0        NA