8 D&H Ch8 - Regressor Importance: “politics”
Darlington & Hayes, Chapter 8’s example
# install.packages("remotes")
# remotes::install_github("sarbearschwartz/apaSupp")
# remotes::install_github("ddsjoberg/gtsummary")
library(tidyverse)
library(haven)
library(flextable)
library(apaSupp)
library(car)
library(rempsyc)
library(parameters)
library(performance)
library(interactions)
library(ggResidpanel)
library(relaimpo)
library(dominanceanalysis)
library(domir)
8.1 PURPOSE
RESEARCH QUESTION:
RQ1) Is news consumption via national news broadcast and newspaper correlated with knowledge of the political process, after accounting for sex and age?
RQ2) If so, do both sources of news have the same effect on political knowledge?
RQ3) Is listening to political talk radio more or less important than watching the national network news broadcast?
8.1.1 Data Import
You can download the politics
dataset here:
Rows: 20
Columns: 4
$ mstatus <dbl+lbl> 3, 2, 4, 4, 1, 3, 2, 2, 3, 1, 3, 4, 1, 1, 2, 1, 3, 2, 4, 1
$ satis <dbl> 85, 80, 72, 60, 92, 88, 74, 84, 88, 82, 76, 78, 78, 93, 73…
$ income <dbl> 53, 65, 54, 35, 73, 75, 57, 59, 60, 52, 47, 60, 63, 66, 51, 53…
$ sex <dbl+lbl> 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1
mstatus satis income sex
Min. :1.00 Min. :60.00 Min. :35.00 Min. :0.00
1st Qu.:1.00 1st Qu.:75.75 1st Qu.:52.75 1st Qu.:0.00
Median :2.00 Median :80.00 Median :56.00 Median :0.00
Mean :2.35 Mean :80.35 Mean :56.90 Mean :0.45
3rd Qu.:3.00 3rd Qu.:85.75 3rd Qu.:61.50 3rd Qu.:1.00
Max. :4.00 Max. :93.00 Max. :75.00 Max. :1.00
When importing data from SPSS (.sav), you need to be careful how categorical vairables are stored.
Rows: 20
Columns: 4
$ mstatus <fct> single, divorced, widowed, widowed, married, single, divorced,…
$ satis <dbl> 85, 80, 72, 60, 92, 88, 74, 84, 88, 82, 76, 78, 78, 93, 73, 80…
$ income <dbl> 53, 65, 54, 35, 73, 75, 57, 59, 60, 52, 47, 60, 63, 66, 51, 53…
$ sex <fct> female, male, female, female, male, male, female, female, fema…
mstatus satis income sex
married :6 Min. :60.00 Min. :35.00 female:11
divorced:5 1st Qu.:75.75 1st Qu.:52.75 male : 9
single :5 Median :80.00 Median :56.00
widowed :4 Mean :80.35 Mean :56.90
3rd Qu.:85.75 3rd Qu.:61.50
Max. :93.00 Max. :75.00
8.1.2 Data Description
National survey of residents of the United States.
Participants were asked a set of questios used to quantify their knowledge of politics, politicians, and the political process.
Assumption: knowledge is caused by exposure to information
8.1.2.1 Variables
Dependent Variable (outcome, Y)
pknow
knowledge of the political process
Independent Variables (predictors or regressors, X’s)
Frequency of Exposure (days per week)…
talkrad
listening to political talk radionatnews
watch national news broadcastsnpnews
read newspaperlocnews
watch local news
Covariates
age
age in years, 18-90sex
sex, Male vs. Female
Categorical variables MUST be declare as FACTORS and the FIRST level listed is treated as the reference category.
df_pol <- df_spss %>%
haven::as_factor() %>%
haven::zap_label() %>%
tibble::rowid_to_column(var = "id") %>%
dplyr::mutate(news_sum = (natnews + npnews)/2) %>%
dplyr::mutate(news_dif = (natnews - npnews)/2)
Rows: 340
Columns: 19
$ id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
$ pknow <dbl> 8, 12, 11, 13, 14, 8, 10, 15, 9, 6, 8, 9, 10, 2, 9, 6, 4, 15,…
$ age <dbl> 35, 40, 43, 26, 41, 41, 18, 31, 18, 72, 43, 43, 63, 32, 30, 2…
$ educ <dbl> 13, 14, 14, 16, 12, 12, 12, 14, 11, 12, 10, 17, 12, 14, 16, 1…
$ sex <fct> Female, Female, Male, Female, Female, Male, Female, Male, Fem…
$ income <dbl> 42.5, 42.5, 110.0, 100.0, 57.5, 80.0, 42.5, 90.0, 30.0, 30.0,…
$ polint <dbl> 2, 2, 2, 3, 2, 2, 2, 3, 3, 2, 2, 3, 3, 2, 3, 2, 1, 3, 2, 2, 3…
$ party <fct> Democrat, Republican, Republican, Democrat, Republican, Democ…
$ libcon <dbl> 3, 6, 4, 2, 3, 3, 2, 7, 5, 6, 5, 3, 3, 5, 3, 1, 6, 6, 5, 3, 5…
$ pdiscuss <dbl> 4, 3, 3, 2, 7, 3, 2, 7, 5, 7, 2, 0, 7, 7, 7, 3, 7, 7, 7, 5, 7…
$ natnews <dbl> 3, 0, 1, 3, 0, 0, 2, 2, 3, 0, 3, 4, 7, 3, 1, 0, 3, 0, 2, 0, 5…
$ npnews <dbl> 1, 4, 2, 5, 0, 7, 4, 0, 7, 7, 3, 1, 4, 1, 1, 3, 1, 7, 2, 7, 3…
$ locnews <dbl> 3.5, 3.5, 3.5, 5.0, 5.0, 3.5, 1.5, 2.0, 7.0, 7.0, 0.5, 4.0, 3…
$ talkrad <dbl> 4.5, 3.5, 3.5, 1.0, 1.0, 1.0, 1.0, 4.5, 1.0, 1.0, 1.0, 1.0, 2…
$ ses <dbl> -0.58, -0.34, 0.50, 0.85, -0.63, -0.34, -0.81, 0.25, -1.21, -…
$ news <dbl> 2.50, 2.50, 2.16, 4.33, 1.66, 3.50, 2.50, 1.33, 5.66, 4.66, 2…
$ demoneg <dbl> 3.14, 2.85, 2.00, 2.00, 2.00, 2.00, 2.00, 2.57, 1.85, 2.57, 2…
$ news_sum <dbl> 2.0, 2.0, 1.5, 4.0, 0.0, 3.5, 3.0, 1.0, 5.0, 3.5, 3.0, 2.5, 5…
$ news_dif <dbl> 1.0, -2.0, -0.5, -1.0, 0.0, -3.5, -1.0, 1.0, -2.0, -3.5, 0.0,…
id pknow age educ sex
Min. : 1.00 Min. : 0.00 Min. :18.00 Min. : 6.00 Female:177
1st Qu.: 85.75 1st Qu.: 8.00 1st Qu.:35.00 1st Qu.:12.00 Male :163
Median :170.50 Median :11.00 Median :43.00 Median :14.00
Mean :170.50 Mean :11.31 Mean :44.93 Mean :14.29
3rd Qu.:255.25 3rd Qu.:15.00 3rd Qu.:54.25 3rd Qu.:16.00
Max. :340.00 Max. :21.00 Max. :90.00 Max. :17.00
income polint party libcon
Min. : 2.50 Min. :1.000 Democrat :141 Min. :1.0
1st Qu.: 42.50 1st Qu.:2.000 Republican:146 1st Qu.:3.0
Median : 57.50 Median :3.000 Other : 53 Median :5.0
Mean : 64.43 Mean :2.818 Mean :4.5
3rd Qu.: 80.00 3rd Qu.:3.000 3rd Qu.:6.0
Max. :200.00 Max. :4.000 Max. :7.0
pdiscuss natnews npnews locnews
Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
1st Qu.:3.000 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000
Median :7.000 Median :3.000 Median :3.000 Median :3.000
Mean :4.865 Mean :3.412 Mean :3.574 Mean :2.966
3rd Qu.:7.000 3rd Qu.:7.000 3rd Qu.:7.000 3rd Qu.:4.500
Max. :7.000 Max. :7.000 Max. :7.000 Max. :7.000
talkrad ses news demoneg
Min. :1.000 Min. :-2.6200000 Min. :0.000 Min. :1.000
1st Qu.:1.000 1st Qu.:-0.5850000 1st Qu.:2.000 1st Qu.:1.830
Median :1.000 Median :-0.0750000 Median :3.080 Median :2.140
Mean :2.009 Mean : 0.0003235 Mean :3.314 Mean :2.242
3rd Qu.:3.000 3rd Qu.: 0.5000000 3rd Qu.:4.660 3rd Qu.:2.710
Max. :5.000 Max. : 2.1000000 Max. :7.000 Max. :4.000
news_sum news_dif
Min. :0.000 Min. :-3.50000
1st Qu.:1.500 1st Qu.:-1.00000
Median :3.500 Median : 0.00000
Mean :3.493 Mean :-0.08088
3rd Qu.:5.000 3rd Qu.: 1.00000
Max. :7.000 Max. : 3.50000
8.2 REGRESSION ANALYSIS
8.2.1 RQ1) Fit Main Model
apaSupp::tab_lm(fit_pol_1,
var_labels = c(natnews = "National News",
npnews = "Newspapers",
age = "Age",
sex = "Sex"),
d = 3) %>%
flextable::width(j = 1, width = 1.5)
Variable | b | (SE) | p | b* | 𝜂² | 𝜂ₚ² |
---|---|---|---|---|---|---|
(Intercept) | 8.795 | (0.733) | < .001 *** | |||
National News | 0.155 | (0.089) | .082 | 0.094 | .008 | .009 |
Newspapers | 0.371 | (0.084) | < .001 *** | 0.239 | .049 | .055 |
Age | -0.009 | (0.017) | .578 | -0.031 | .001 | .001 |
Sex | .064 | .070 | ||||
Female | — | — | ||||
Male | 2.245 | (0.445) | < .001 *** | |||
R² | 0.161 | |||||
Adjusted R² | 0.150 | |||||
Note. b* = standardized estimate. 𝜂² = semi-partial correlation. 𝜂ₚ² = partial correlation. | ||||||
* p < .05. ** p < .01. *** p < .001. |
8.2.2 RQ2) Parameter Equivalence
fit_pol_2 <- lm(pknow ~ news_sum + age + sex,
data = df_pol)
fit_pol_3 <- lm(pknow ~ news_sum + news_dif + age + sex,
data = df_pol)
apaSupp::tab_lms(list(fit_pol_1, fit_pol_2, fit_pol_3),
var_labels = c(news_sum = "News, sum",
news_dif = "News, dif",
age = "Age",
sex = "Sex"))
| Model 1 | Model 2 | Model 3 | ||||||
---|---|---|---|---|---|---|---|---|---|
Variable | b | (SE) | p | b | (SE) | p | b | (SE) | p |
(Intercept) | 8.79 | (0.73) | < .001 *** | 8.78 | (0.73) | < .001 *** | 8.79 | (0.73) | < .001 *** |
natnews | 0.15 | (0.09) | .082 | ||||||
npnews | 0.37 | (0.08) | < .001 *** | ||||||
Age | -0.01 | (0.02) | .578 | -0.01 | (0.02) | .540 | -0.01 | (0.02) | .578 |
Sex | |||||||||
Female | — | — | — | — | — | — | |||
Male | 2.25 | (0.45) | < .001 *** | 2.33 | (0.44) | < .001 *** | 2.25 | (0.45) | < .001 *** |
News, sum | 0.54 | (0.11) | < .001 *** | 0.53 | (0.11) | < .001 *** | |||
News, dif | -0.22 | (0.13) | .097 | ||||||
R² | 0.16 | 0.15 | 0.16 | ||||||
Adjusted R² | 0.15 | 0.15 | 0.15 | ||||||
* p < .05. ** p < .01. *** p < .001. |
# A tibble: 2 × 6
Res.Df RSS Df `Sum of Sq` F `Pr(>F)`
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 336 5487. NA NA NA NA
2 335 5442. 1 45.1 2.78 0.0966
8.2.3 RQ3) Variable Contribution
8.2.3.1 Variable Importance
# A tibble: 4 × 1
Overall
<dbl>
1 6.03
2 4.14
3 4.90
4 2.27
lm(pknow ~ npnews + locnews + talkrad + natnews,
data = df_pol) %>%
relaimpo::calc.relimp(type = "lmg", importance = TRUE)
Response variable: pknow
Total response variance: 19.12264
Analysis based on 340 observations
4 Regressors:
npnews locnews talkrad natnews
Proportion of variance explained by model: 19.71%
Metrics are not normalized (rela=FALSE).
Relative importance metrics:
lmg
npnews 0.08787767
locnews 0.02812186
talkrad 0.06292808
natnews 0.01819988
Average coefficients for different model sizes:
1X 2Xs 3Xs 4Xs
npnews 0.4629151 0.4650344 0.4695145 0.4716805
locnews -0.2108792 -0.3214204 -0.3986934 -0.4440727
talkrad 0.8869822 0.8677825 0.8527704 0.8324193
natnews 0.2444027 0.2313024 0.2191853 0.2086020
8.2.3.2 Relative Improvement in Fit
Set A: * newspaper * localnews
Add: * national news * talk ratio
data.frame(A = c("None",
"Newspaper",
"Local News",
"Both"),
base = c("pknow ~ 1",
"pknow ~ npnews",
"pknow ~ locnews",
"pknow ~ npnews+locnews")) %>%
dplyr::mutate(add_talk = paste0(base, "+talkrad")) %>%
dplyr::mutate(add_natn = paste0(base, "+natnews")) %>%
dplyr::mutate(fit_base = purrr::map(base,
~lm(.x, data = df_pol))) %>%
dplyr::mutate(fit_talk = purrr::map(add_talk,
~lm(.x, data = df_pol))) %>%
dplyr::mutate(fit_natn = purrr::map(add_natn,
~lm(.x, data = df_pol))) %>%
dplyr::mutate(R2 = purrr::map_dbl(fit_base,
~ broom::glance(.x)$r.squared)) %>%
dplyr::mutate(R2i = purrr::map_dbl(fit_talk,
~ broom::glance(.x)$r.squared)) %>%
dplyr::mutate(R2j = purrr::map_dbl(fit_natn,
~ broom::glance(.x)$r.squared)) %>%
dplyr::mutate(R = sqrt(R2)) %>%
dplyr::mutate(Ri = sqrt(R2i)) %>%
dplyr::mutate(Rj = sqrt(R2j)) %>%
dplyr::mutate(dRi = Ri - R) %>%
dplyr::mutate(dRj = Rj - R) %>%
dplyr::select("Set A subset" = A,
"R" = R,
"Adding\nTalk Radio\nR" = Ri,
"Adding\nNational News\nR" = Rj,
"Talk Radio\nChange\nin R" = Ri,
"National News\nChange\nin R" = Rj) %>%
flextable::flextable() %>%
apaSupp::theme_apa(caption = "Table 8.2 - Relative Improvement in Fit for Dominance Computations",
d = 3)
Set A subset | R | Adding | Adding | Talk Radio | National News |
---|---|---|---|---|---|
None | 0.000 | 0.261 | 0.148 | 0.261 | 0.148 |
Newspaper | 0.298 | 0.393 | 0.310 | 0.393 | 0.310 |
Local News | 0.106 | 0.288 | 0.237 | 0.288 | 0.237 |
Both | 0.338 | 0.430 | 0.373 | 0.430 | 0.373 |
8.2.3.3 Dominance Ananlysis
see: https://cran.r-project.org/web/packages/domir/vignettes/domir_basics.html
domir(
pknow ~ npnews + locnews + talkrad + natnews,
function(formula) {
lm_model <- lm(formula, data = df_pol)
summary(lm_model)[["r.squared"]]
}
)
Overall Value: 0.1971275
General Dominance Values:
General Dominance Standardized Ranks
npnews 0.08787767 0.44579104 1
locnews 0.02812186 0.14265824 3
talkrad 0.06292808 0.31922529 2
natnews 0.01819988 0.09232542 4
Conditional Dominance Values:
Include At: 1 Include At: 2 Include At: 3 Include At: 4
npnews 0.08896006 0.08783655 0.08769250 0.08702156
locnews 0.01133509 0.02477780 0.03521980 0.04115476
talkrad 0.06808081 0.06448014 0.06150433 0.05764703
natnews 0.02193340 0.02061423 0.01791688 0.01233499
Complete Dominance Proportions:
> npnews > locnews > talkrad > natnews
npnews > NA 1.00 1 1.00
locnews > 0 NA 0 0.75
talkrad > 0 1.00 NA 1.00
natnews > 0 0.25 0 NA