8 D&H Ch8 - Regressor Importance: “politics”

Darlington & Hayes, Chapter 8’s example

Darlington, Richard B., and Andrew F. Hayes. (2016) Regression Analysis and Linear Models: Concepts, Applications, and Implementation, Guilford Publications.

# install.packages("remotes")
# remotes::install_github("sarbearschwartz/apaSupp")
# remotes::install_github("ddsjoberg/gtsummary")

library(tidyverse) 
library(haven)
library(flextable)
library(apaSupp)
library(car)
library(rempsyc)
library(parameters)
library(performance)
library(interactions)
library(relaimpo)
library(dominanceanalysis)
library(domir)

flextable::set_flextable_defaults(digits = 2)

8.1 PURPOSE

8.1.1 Research Questions

RQ1) Is news consumption via national news broadcast and newspaper correlated with knowledge of the political process, after accounting for sex and age?

RQ2) If so, do both sources of news have the same effect on political knowledge?

RQ3) Is listening to political talk radio more or less important than watching the national network news broadcast?

8.1.2 Data Import

You can download the politics dataset here:

SPSS format (.sav)

df_spss <- haven::read_sav("politics.sav")

tibble::glimpse(df_spss)

Rows: 340
Columns: 16
$ pknow    <dbl> 8, 12, 11, 13, 14, 8, 10, 15, 9, 6, 8, 9, 10, 2, 9, 6, 4, 15,…
$ age      <dbl> 35, 40, 43, 26, 41, 41, 18, 31, 18, 72, 43, 43, 63, 32, 30, 2…
$ educ     <dbl> 13, 14, 14, 16, 12, 12, 12, 14, 11, 12, 10, 17, 12, 14, 16, 1…
$ sex      <dbl+lbl> 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, …
$ income   <dbl> 42.5, 42.5, 110.0, 100.0, 57.5, 80.0, 42.5, 90.0, 30.0, 30.0,…
$ polint   <dbl> 2, 2, 2, 3, 2, 2, 2, 3, 3, 2, 2, 3, 3, 2, 3, 2, 1, 3, 2, 2, 3…
$ party    <dbl+lbl> 1, 2, 2, 1, 2, 1, 2, 2, 1, 3, 2, 1, 3, 2, 1, 1, 2, 2, 1, …
$ libcon   <dbl> 3, 6, 4, 2, 3, 3, 2, 7, 5, 6, 5, 3, 3, 5, 3, 1, 6, 6, 5, 3, 5…
$ pdiscuss <dbl> 4, 3, 3, 2, 7, 3, 2, 7, 5, 7, 2, 0, 7, 7, 7, 3, 7, 7, 7, 5, 7…
$ natnews  <dbl> 3, 0, 1, 3, 0, 0, 2, 2, 3, 0, 3, 4, 7, 3, 1, 0, 3, 0, 2, 0, 5…
$ npnews   <dbl> 1, 4, 2, 5, 0, 7, 4, 0, 7, 7, 3, 1, 4, 1, 1, 3, 1, 7, 2, 7, 3…
$ locnews  <dbl> 3.5, 3.5, 3.5, 5.0, 5.0, 3.5, 1.5, 2.0, 7.0, 7.0, 0.5, 4.0, 3…
$ talkrad  <dbl> 4.5, 3.5, 3.5, 1.0, 1.0, 1.0, 1.0, 4.5, 1.0, 1.0, 1.0, 1.0, 2…
$ ses      <dbl> -0.58, -0.34, 0.50, 0.85, -0.63, -0.34, -0.81, 0.25, -1.21, -…
$ news     <dbl> 2.50, 2.50, 2.16, 4.33, 1.66, 3.50, 2.50, 1.33, 5.66, 4.66, 2…
$ demoneg  <dbl> 3.14, 2.85, 2.00, 2.00, 2.00, 2.00, 2.00, 2.57, 1.85, 2.57, 2…

summary(df_spss)

     pknow            age             educ            sex        
 Min.   : 0.00   Min.   :18.00   Min.   : 6.00   Min.   :0.0000  
 1st Qu.: 8.00   1st Qu.:35.00   1st Qu.:12.00   1st Qu.:0.0000  
 Median :11.00   Median :43.00   Median :14.00   Median :0.0000  
 Mean   :11.31   Mean   :44.93   Mean   :14.29   Mean   :0.4794  
 3rd Qu.:15.00   3rd Qu.:54.25   3rd Qu.:16.00   3rd Qu.:1.0000  
 Max.   :21.00   Max.   :90.00   Max.   :17.00   Max.   :1.0000  
     income           polint          party           libcon       pdiscuss    
 Min.   :  2.50   Min.   :1.000   Min.   :1.000   Min.   :1.0   Min.   :0.000  
 1st Qu.: 42.50   1st Qu.:2.000   1st Qu.:1.000   1st Qu.:3.0   1st Qu.:3.000  
 Median : 57.50   Median :3.000   Median :2.000   Median :5.0   Median :7.000  
 Mean   : 64.43   Mean   :2.818   Mean   :1.741   Mean   :4.5   Mean   :4.865  
 3rd Qu.: 80.00   3rd Qu.:3.000   3rd Qu.:2.000   3rd Qu.:6.0   3rd Qu.:7.000  
 Max.   :200.00   Max.   :4.000   Max.   :3.000   Max.   :7.0   Max.   :7.000  
    natnews          npnews         locnews         talkrad     
 Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :1.000  
 1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000  
 Median :3.000   Median :3.000   Median :3.000   Median :1.000  
 Mean   :3.412   Mean   :3.574   Mean   :2.966   Mean   :2.009  
 3rd Qu.:7.000   3rd Qu.:7.000   3rd Qu.:4.500   3rd Qu.:3.000  
 Max.   :7.000   Max.   :7.000   Max.   :7.000   Max.   :5.000  
      ses                  news          demoneg     
 Min.   :-2.6200000   Min.   :0.000   Min.   :1.000  
 1st Qu.:-0.5850000   1st Qu.:2.000   1st Qu.:1.830  
 Median :-0.0750000   Median :3.080   Median :2.140  
 Mean   : 0.0003235   Mean   :3.314   Mean   :2.242  
 3rd Qu.: 0.5000000   3rd Qu.:4.660   3rd Qu.:2.710  
 Max.   : 2.1000000   Max.   :7.000   Max.   :4.000

When importing data from SPSS (.sav), you need to be careful how categorical vairables are stored.

df_fac <- df_spss %>% 
  haven::as_factor()

tibble::glimpse(df_fac)

Rows: 340
Columns: 16
$ pknow    <dbl> 8, 12, 11, 13, 14, 8, 10, 15, 9, 6, 8, 9, 10, 2, 9, 6, 4, 15,…
$ age      <dbl> 35, 40, 43, 26, 41, 41, 18, 31, 18, 72, 43, 43, 63, 32, 30, 2…
$ educ     <dbl> 13, 14, 14, 16, 12, 12, 12, 14, 11, 12, 10, 17, 12, 14, 16, 1…
$ sex      <fct> Female, Female, Male, Female, Female, Male, Female, Male, Fem…
$ income   <dbl> 42.5, 42.5, 110.0, 100.0, 57.5, 80.0, 42.5, 90.0, 30.0, 30.0,…
$ polint   <dbl> 2, 2, 2, 3, 2, 2, 2, 3, 3, 2, 2, 3, 3, 2, 3, 2, 1, 3, 2, 2, 3…
$ party    <fct> Democrat, Republican, Republican, Democrat, Republican, Democ…
$ libcon   <dbl> 3, 6, 4, 2, 3, 3, 2, 7, 5, 6, 5, 3, 3, 5, 3, 1, 6, 6, 5, 3, 5…
$ pdiscuss <dbl> 4, 3, 3, 2, 7, 3, 2, 7, 5, 7, 2, 0, 7, 7, 7, 3, 7, 7, 7, 5, 7…
$ natnews  <dbl> 3, 0, 1, 3, 0, 0, 2, 2, 3, 0, 3, 4, 7, 3, 1, 0, 3, 0, 2, 0, 5…
$ npnews   <dbl> 1, 4, 2, 5, 0, 7, 4, 0, 7, 7, 3, 1, 4, 1, 1, 3, 1, 7, 2, 7, 3…
$ locnews  <dbl> 3.5, 3.5, 3.5, 5.0, 5.0, 3.5, 1.5, 2.0, 7.0, 7.0, 0.5, 4.0, 3…
$ talkrad  <dbl> 4.5, 3.5, 3.5, 1.0, 1.0, 1.0, 1.0, 4.5, 1.0, 1.0, 1.0, 1.0, 2…
$ ses      <dbl> -0.58, -0.34, 0.50, 0.85, -0.63, -0.34, -0.81, 0.25, -1.21, -…
$ news     <dbl> 2.50, 2.50, 2.16, 4.33, 1.66, 3.50, 2.50, 1.33, 5.66, 4.66, 2…
$ demoneg  <dbl> 3.14, 2.85, 2.00, 2.00, 2.00, 2.00, 2.00, 2.57, 1.85, 2.57, 2…

summary(df_fac)

     pknow            age             educ           sex          income      
 Min.   : 0.00   Min.   :18.00   Min.   : 6.00   Female:177   Min.   :  2.50  
 1st Qu.: 8.00   1st Qu.:35.00   1st Qu.:12.00   Male  :163   1st Qu.: 42.50  
 Median :11.00   Median :43.00   Median :14.00                Median : 57.50  
 Mean   :11.31   Mean   :44.93   Mean   :14.29                Mean   : 64.43  
 3rd Qu.:15.00   3rd Qu.:54.25   3rd Qu.:16.00                3rd Qu.: 80.00  
 Max.   :21.00   Max.   :90.00   Max.   :17.00                Max.   :200.00  
     polint             party         libcon       pdiscuss        natnews     
 Min.   :1.000   Democrat  :141   Min.   :1.0   Min.   :0.000   Min.   :0.000  
 1st Qu.:2.000   Republican:146   1st Qu.:3.0   1st Qu.:3.000   1st Qu.:1.000  
 Median :3.000   Other     : 53   Median :5.0   Median :7.000   Median :3.000  
 Mean   :2.818                    Mean   :4.5   Mean   :4.865   Mean   :3.412  
 3rd Qu.:3.000                    3rd Qu.:6.0   3rd Qu.:7.000   3rd Qu.:7.000  
 Max.   :4.000                    Max.   :7.0   Max.   :7.000   Max.   :7.000  
     npnews         locnews         talkrad           ses            
 Min.   :0.000   Min.   :0.000   Min.   :1.000   Min.   :-2.6200000  
 1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:-0.5850000  
 Median :3.000   Median :3.000   Median :1.000   Median :-0.0750000  
 Mean   :3.574   Mean   :2.966   Mean   :2.009   Mean   : 0.0003235  
 3rd Qu.:7.000   3rd Qu.:4.500   3rd Qu.:3.000   3rd Qu.: 0.5000000  
 Max.   :7.000   Max.   :7.000   Max.   :5.000   Max.   : 2.1000000  
      news          demoneg     
 Min.   :0.000   Min.   :1.000  
 1st Qu.:2.000   1st Qu.:1.830  
 Median :3.080   Median :2.140  
 Mean   :3.314   Mean   :2.242  
 3rd Qu.:4.660   3rd Qu.:2.710  
 Max.   :7.000   Max.   :4.000

8.1.3 Data Description

National survey of residents of the United States.

Participants were asked a set of questios used to quantify their knowledge of politics, politicians, and the political process.

Assumption: knowledge is caused by exposure to information

8.1.3.1 Variables

Dependent Variable (outcome, Y)

pknow knowledge of the political process

Independent Variables (predictors or regressors, X’s)

Frequency of Exposure (days per week)…

talkrad listening to political talk radio
natnews watch national news broadcasts
npnews read newspaper
locnews watch local news

Covariates

age age in years, 18-90
sex sex, Male vs. Female

Categorical variables MUST be declare as FACTORS and the FIRST level listed is treated as the reference category.

df_pol <- df_spss %>% 
  haven::as_factor() %>% 
  haven::zap_label() %>% 
  tibble::rowid_to_column(var = "id") %>% 
  dplyr::mutate(news_sum = (natnews + npnews)/2) %>% 
  dplyr::mutate(news_dif = (natnews - npnews)/2)

tibble::glimpse(df_pol)

Rows: 340
Columns: 19
$ id       <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
$ pknow    <dbl> 8, 12, 11, 13, 14, 8, 10, 15, 9, 6, 8, 9, 10, 2, 9, 6, 4, 15,…
$ age      <dbl> 35, 40, 43, 26, 41, 41, 18, 31, 18, 72, 43, 43, 63, 32, 30, 2…
$ educ     <dbl> 13, 14, 14, 16, 12, 12, 12, 14, 11, 12, 10, 17, 12, 14, 16, 1…
$ sex      <fct> Female, Female, Male, Female, Female, Male, Female, Male, Fem…
$ income   <dbl> 42.5, 42.5, 110.0, 100.0, 57.5, 80.0, 42.5, 90.0, 30.0, 30.0,…
$ polint   <dbl> 2, 2, 2, 3, 2, 2, 2, 3, 3, 2, 2, 3, 3, 2, 3, 2, 1, 3, 2, 2, 3…
$ party    <fct> Democrat, Republican, Republican, Democrat, Republican, Democ…
$ libcon   <dbl> 3, 6, 4, 2, 3, 3, 2, 7, 5, 6, 5, 3, 3, 5, 3, 1, 6, 6, 5, 3, 5…
$ pdiscuss <dbl> 4, 3, 3, 2, 7, 3, 2, 7, 5, 7, 2, 0, 7, 7, 7, 3, 7, 7, 7, 5, 7…
$ natnews  <dbl> 3, 0, 1, 3, 0, 0, 2, 2, 3, 0, 3, 4, 7, 3, 1, 0, 3, 0, 2, 0, 5…
$ npnews   <dbl> 1, 4, 2, 5, 0, 7, 4, 0, 7, 7, 3, 1, 4, 1, 1, 3, 1, 7, 2, 7, 3…
$ locnews  <dbl> 3.5, 3.5, 3.5, 5.0, 5.0, 3.5, 1.5, 2.0, 7.0, 7.0, 0.5, 4.0, 3…
$ talkrad  <dbl> 4.5, 3.5, 3.5, 1.0, 1.0, 1.0, 1.0, 4.5, 1.0, 1.0, 1.0, 1.0, 2…
$ ses      <dbl> -0.58, -0.34, 0.50, 0.85, -0.63, -0.34, -0.81, 0.25, -1.21, -…
$ news     <dbl> 2.50, 2.50, 2.16, 4.33, 1.66, 3.50, 2.50, 1.33, 5.66, 4.66, 2…
$ demoneg  <dbl> 3.14, 2.85, 2.00, 2.00, 2.00, 2.00, 2.00, 2.57, 1.85, 2.57, 2…
$ news_sum <dbl> 2.0, 2.0, 1.5, 4.0, 0.0, 3.5, 3.0, 1.0, 5.0, 3.5, 3.0, 2.5, 5…
$ news_dif <dbl> 1.0, -2.0, -0.5, -1.0, 0.0, -3.5, -1.0, 1.0, -2.0, -3.5, 0.0,…

summary(df_pol)

       id             pknow            age             educ           sex     
 Min.   :  1.00   Min.   : 0.00   Min.   :18.00   Min.   : 6.00   Female:177  
 1st Qu.: 85.75   1st Qu.: 8.00   1st Qu.:35.00   1st Qu.:12.00   Male  :163  
 Median :170.50   Median :11.00   Median :43.00   Median :14.00               
 Mean   :170.50   Mean   :11.31   Mean   :44.93   Mean   :14.29               
 3rd Qu.:255.25   3rd Qu.:15.00   3rd Qu.:54.25   3rd Qu.:16.00               
 Max.   :340.00   Max.   :21.00   Max.   :90.00   Max.   :17.00               
     income           polint             party         libcon   
 Min.   :  2.50   Min.   :1.000   Democrat  :141   Min.   :1.0  
 1st Qu.: 42.50   1st Qu.:2.000   Republican:146   1st Qu.:3.0  
 Median : 57.50   Median :3.000   Other     : 53   Median :5.0  
 Mean   : 64.43   Mean   :2.818                    Mean   :4.5  
 3rd Qu.: 80.00   3rd Qu.:3.000                    3rd Qu.:6.0  
 Max.   :200.00   Max.   :4.000                    Max.   :7.0  
    pdiscuss        natnews          npnews         locnews     
 Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :0.000  
 1st Qu.:3.000   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000  
 Median :7.000   Median :3.000   Median :3.000   Median :3.000  
 Mean   :4.865   Mean   :3.412   Mean   :3.574   Mean   :2.966  
 3rd Qu.:7.000   3rd Qu.:7.000   3rd Qu.:7.000   3rd Qu.:4.500  
 Max.   :7.000   Max.   :7.000   Max.   :7.000   Max.   :7.000  
    talkrad           ses                  news          demoneg     
 Min.   :1.000   Min.   :-2.6200000   Min.   :0.000   Min.   :1.000  
 1st Qu.:1.000   1st Qu.:-0.5850000   1st Qu.:2.000   1st Qu.:1.830  
 Median :1.000   Median :-0.0750000   Median :3.080   Median :2.140  
 Mean   :2.009   Mean   : 0.0003235   Mean   :3.314   Mean   :2.242  
 3rd Qu.:3.000   3rd Qu.: 0.5000000   3rd Qu.:4.660   3rd Qu.:2.710  
 Max.   :5.000   Max.   : 2.1000000   Max.   :7.000   Max.   :4.000  
    news_sum        news_dif       
 Min.   :0.000   Min.   :-3.50000  
 1st Qu.:1.500   1st Qu.:-1.00000  
 Median :3.500   Median : 0.00000  
 Mean   :3.493   Mean   :-0.08088  
 3rd Qu.:5.000   3rd Qu.: 1.00000  
 Max.   :7.000   Max.   : 3.50000

8.2 RQ1) Fit Main Model

df_pol %>% 
  dplyr::select(pknow, natnews, npnews) %>% 
  apaSupp::tab_cor() %>% 
  flextable::hline(i = 2)

**Table 8.1**
Pairwise Correlations
Variable Pair		r	p
natnews	pknow	.150	.006**
npnews	pknow	.300	< .001***
npnews	natnews	.220	< .001***
Note. N = 340. r = Pearson's Product-Moment correlation coefficient.
* p < .05. p < .01. * p < .001.

fit_pol_1 <- lm(pknow ~ natnews + npnews + age + sex,
                data = df_pol)

fit_pol_1nn <- lm(pknow ~ natnews + age + sex,
                data = df_pol)

fit_pol_1np <- lm(pknow ~ npnews + age + sex,
                data = df_pol)

apaSupp::tab_lms(list(fit_pol_1, fit_pol_1nn, fit_pol_1np),
                var_labels = c(natnews = "National News",
                               npnews = "Newspapers",
                               age = "Age",
                               sex = "Sex")) %>% 
  flextable::width(j = 1, width = 1.5)

**Table 8.2**
Compare Regression Models
	Model 1			Model 2			Model 3
Variable	b	(SE)	p	b	(SE)	p	b	(SE)	p
(Intercept)	8.79	(0.73)	< .001***	8.92	(0.75)	< .001***	8.87	(0.73)	< .001***
National News	0.15	(0.09)	.082	0.20	(0.09)	.025*
Newspapers	0.37	(0.08)	< .001***				0.39	(0.08)	< .001***
Age	-0.01	(0.02)	.578	0.01	(0.02)	.548	0.00	(0.02)	.968
Sex
Female	—	—		—	—		—	—
Male	2.2	(0.45)	< .001***	2.6	(0.45)	< .001***	2.2	(0.45)	< .001***
AIC	1919.7			1936.9			1920.8
BIC	1942.7			1956.0			1939.9
R²	.161			.112			.153
Adjusted R²	.150			.104			.145
Note.
* p < .05. p < .01. * p < .001.

apaSupp::tab_lm(fit_pol_1,
                var_labels = c(natnews = "National News",
                               npnews = "Newspapers",
                               age = "Age",
                               sex = "Sex"),
                d = 3,
                vif = TRUE) %>% 
  flextable::width(j = 1, width = 1.5)

**Table 8.3**
Parameter Estimates for Linear Regression
	b	(SE)	p	b*	VIF	η²	ηₚ²
(Intercept)	8.795	(0.733)	< .0010***
National News	0.155	(0.089)	.0825	0.094	1.156	.0076	.0090
Newspapers	0.371	(0.084)	< .0010***	0.239	1.169	.0488	.0550
Age	-0.009	(0.017)	.5778	-0.031	1.222	.0008	.0009
Sex					1.037	.0636	.0705
Female	—	—
Male	2.25	(0.445)	< .0010***
R²	.1605
Adjusted R²	.1505
Note. N = 340. VIF = variance inflation factor; η² = semi-partial correlation; ηₚ² = partial correlation; b* = standardize coefficient; p = significance from Wald t-test for parameter estimate.
* p < .05. p < .01. * p < .001.

8.2.1 Interpretation

8.3 RQ2) Parameter Equivalence

fit_pol_2 <- lm(pknow ~ news_sum + age + sex,
                data = df_pol)

fit_pol_3 <- lm(pknow ~ news_sum + news_dif + age + sex,
                data = df_pol)

apaSupp::tab_lms(list(fit_pol_1, fit_pol_2, fit_pol_3),
                 var_labels = c(news_sum = "News, sum",
                                news_dif = "News, dif",
                                age = "Age",
                                sex = "Sex"))

**Table 8.4**
Compare Regression Models
	Model 1			Model 2			Model 3
Variable	b	(SE)	p	b	(SE)	p	b	(SE)	p
(Intercept)	8.79	(0.73)	< .001***	8.78	(0.73)	< .001***	8.79	(0.73)	< .001***
natnews	0.15	(0.09)	.082
npnews	0.37	(0.08)	< .001***
Age	-0.01	(0.02)	.578	-0.01	(0.02)	.540	-0.01	(0.02)	.578
Sex
Female	—	—		—	—		—	—
Male	2.2	(0.45)	< .001***	2.3	(0.44)	< .001***	2.2	(0.45)	< .001***
News, sum				0.54	(0.11)	< .001***	0.53	(0.11)	< .001***
News, dif							-0.22	(0.13)	.097
AIC	1919.7			1920.5			1919.7
BIC	1942.7			1939.6			1942.7
R²	.161			.154			.161
Adjusted R²	.150			.146			.150
Note.
* p < .05. p < .01. * p < .001.

car::linearHypothesis(fit_pol_1, c("natnews - npnews"))

# A tibble: 2 × 6
  Res.Df   RSS    Df `Sum of Sq`     F `Pr(>F)`
   <dbl> <dbl> <dbl>       <dbl> <dbl>    <dbl>
1    336 5487.    NA        NA   NA     NA     
2    335 5442.     1        45.1  2.78   0.0966

8.3.1 Interpretation

There is no evidence that the coefficients differ between national news and newspaper, F(1, 335) = 2.78, p = .097.

8.4 RQ3) Variable Contribution

8.4.1 Relavent Statistics

8.4.1.1 Mean of the dependent variable:

\[ \bar{Y} = \frac{\sum Y_i}{n} = \text{mean of all observed } Y \text{ values}\\ \]

For the dependent variable “political knowledge” (pknow), $\bar{Y}$ = 11.31

mean(df_pol$pknow)

[1] 11.30882

8.4.1.2 Sample standard deviations, $s_Y$ and $s_X$’s

The amount of spread or variation in each measured variable

\[ s_Y = \text{standard deviation of } Y\\ s_X = \text{standard deviation of } X\\ \]

For the dependent variable “political knowledge” (pknow), $s_Y$ = 4.37

df_pol %>% 
  dplyr::select(pknow, npnews, locnews, talkrad, natnews) %>% 
  apaSupp::tab_desc(caption = "Summary of Politial Knowledge and Each Predictor")

**Table 8.5**
Summary of Politial Knowledge and Each Predictor
	NA	M	SD	min	Q1	Mdn	Q3	max
pknow	0	11.31	4.37	0.00	8.00	11.00	15.00	21.00
npnews	0	3.57	2.82	0.00	1.00	3.00	7.00	7.00
locnews	0	2.97	2.21	0.00	1.00	3.00	4.50	7.00
talkrad	0	2.01	1.29	1.00	1.00	1.00	3.00	5.00
natnews	0	3.41	2.65	0.00	1.00	3.00	7.00	7.00
Note. N = 340. NA = not available or missing; Mdn = median; Q1 = 25th percentile; Q3 = 75th percentile.

8.4.1.3 Pairwise Correlations

The linear association between pairs of variables

\[ r_{yx} = \text{correlation between Y and X} \\ \]

df_pol %>% 
  dplyr::select(pknow, npnews, locnews, talkrad, natnews) %>% 
  apaSupp::tab_cor(caption = "Pairwise Correlations Between Political Knowledge and Each Predictor ") %>% 
  flextable::hline(i = 4)

**Table 8.6**
Pairwise Correlations Between Political Knowledge and Each Predictor
Variable Pair		r	p
npnews	pknow	.300	< .001***
locnews	pknow	< .001	.050*
talkrad	pknow	.260	< .001***
natnews	pknow	.150	.006**
locnews	npnews	.170	.002**
talkrad	npnews	.014	.790
natnews	npnews	.220	< .001***
talkrad	locnews	.057	.297
natnews	locnews	.420	< .001***
natnews	talkrad	.190	< .001***
Note. N = 340. r = Pearson's Product-Moment correlation coefficient.
* p < .05. p < .01. * p < .001.

8.4.1.4 Predicted Values

Regression estimates or conditional means for the dependent variable:

\[ \hat{Y_i} = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} = \text{ predicted } Y \text{ values}\\ \]

For Model frt_pol_1, here is a summary of the residuals

summary(fitted(fit_pol_1))

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  8.324   9.667  11.448  11.309  12.692  14.432

8.4.1.5 Sum of Squares

(see Darlington & Hayes section 4.2.2 on pages 99-100)

$$ data = model + error \ total = regression + residual \

SS_{total} = SS_{regression} + SS_{residuals} $$

$SS_{total}$ = sum of the squared “deviants” (observed Y - mean Y)
$SS_{residual}$ = sum of the squared residuals (observed Y - predicted Y)
$SS_{regression}$ = sum of the squared regression (predicted Y - mean Y)

\[ SS_{total}= \sum (Y_i - \bar{Y_i})^2 \\ SS_{regression} = \sum (\hat{Y_i} - \bar{Y})^2 \\ SS_{residuals} = \sum (Y_i - \hat{Y_i})^2 \\ \]

$SS_t$ = sum of the squared “deviations”
Differences between observed $Y$ value and the mean of all $Y$ values

For Model frt_pol_1, $SS_{total}$ = 6482.574

df_pol %>% 
  dplyr::mutate(mean_pknow = mean(pknow)) %>% 
  dplyr::mutate(dev_pknow_sq = (pknow - mean_pknow)^2) %>% 
  dplyr::pull(dev_pknow_sq) %>% 
  sum()

[1] 6482.574

ss_total <- var(df_pol$pknow)*(340 - 1)
ss_total

[1] 6482.574

$SS_{residual}$ = sum of the squared “residuals”
Differences between each observed $Y$ value and the corresponding predicted value

For Model frt_pol_1, $SS_{residula}$ = 5442.1

deviance(fit_pol_1)

[1] 5442.07

ss_residuals <- sum(residuals(fit_pol_1)^2)
ss_residuals

[1] 5442.07

$SS_{regression}$ = find by subtraction

\[ SS_{regression} = SS_{total} - SS_{residuals} \]

ss_regression <- ss_total - ss_residuals 
ss_regression

[1] 1040.503

8.4.1.6 Degrees of Freedom

(see Darlington & Hayes section 4.2.3 on pages 99-100)

\[ df_{total} = n - 1 \\ df_{regression} = k \\ df_{residual} = n - (k + 1) \]

For Model frt_pol_1:

$n$ = 340, the sample size
$k$ = 4, the number of predictors in the model
$df_{total}$ = 340 - 1 = 339
$df_{regression}$ = 4
$df_{residual}$ = 340 - (4 + 1) = 335

df_total <- 340 - 1 
df_regression <- 4
df_residual <- 340 - (4 + 1)

anova(fit_pol_1)

# A tibble: 5 × 5
     Df `Sum Sq` `Mean Sq` `F value`     `Pr(>F)`
  <int>    <dbl>     <dbl>     <dbl>        <dbl>
1     1   142.      142.       8.75   0.00331    
2     1   479.      479.      29.5    0.000000107
3     1     6.33      6.33     0.389  0.533      
4     1   413.      413.      25.4    0.000000764
5   335  5442.       16.2     NA     NA

8.4.1.7 Mean Squares

(see Darlington & Hayes section 4.2.4 on pages 100-102)

The name “mean squared” is rather unfortunate given that neither of these is an actual mean of the squared components. These statistics have the property that they are generally less inﬂuenced by adding regressors or cases to a model than are the sums of squares.

$$ MS_{regression} = = \

MS_{residual} = =

The mean squared residual, also called the mean squared error and often abbreviated MSE, is an unbiased estimator of the variance of the errors in estimation of $Y$ , which we denoted $Var(Y|X)$ in Chapter 2.

That is, suppose you wanted to know the amount, on average, $\hat{Y}$ tends to differ from $Y$ when the model is ﬁtted to the entire **population(()) (or in a sample of inﬁnite size). $MS_{residual}$ is generally used in statistics as an important estimator of the square of this quantity.

For Model frt_pol_1, $MS_{residual}$ = 5442.07 /335 = 16.25

ms_residuals <- ss_residuals/df_residual
ms_residuals

[1] 16.24499

summary(fit_pol_1)$sigma^2

[1] 16.24499

8.4.1.8 Coefficient of Determination, $R^2$

Proportion of TOTAL variance in $Y$ explained by all the predictors

\[ R^2 = 1 - \frac{SS_{residual}}{SS_{total}} = \frac{SS_{regression}}{SS_{total}} \]

For Model frt_pol_1, $R^2$ = .161

1 - (ss_residuals/ss_total)

[1] 0.1605077

ss_regression/ss_total

[1] 0.1605077

summary(fit_pol_1)$r.squared

[1] 0.1605077

8.4.2 Fit a Sequence of Models

We treat political knowledge (pknow) as the dependent variable $Y$ and determine whether listening to political talk radio (talkrad) is more or less important than watching the national network news broadcast (natnews) in explaining individual differences in political knowledge.

We do this in the context of a full model that includes all four sources of information as regressors. So $k = 4$. We’ll call political talk ratio regressor $j$ and national network news use regressor $i$ .

The two remaining regressors, reading the newspaper (npnews) and watching the local news broadcast (locnews), are deﬁned as set A.

Note: listening to political talk radio talkrad) is the participant’s average response on an ordinal scale to two questions about how often he or she listens to political talk radio and how much attention he or she pays when listening. But watching the network news (natnews) is measured as number of days per week the person watches the

8.4.2.1 Set A predictors: newspaper and local news

fit_pol_2a <- lm(pknow ~ npnews + locnews,
                 data = df_pol)

r2_a <- summary(fit_pol_2a)$r.squared
r2_a

[1] 0.1141542

8.4.2.2 Set A -PLUS- talk radio

fit_pol_2at <- lm(pknow ~ npnews + locnews + talkrad ,
                  data = df_pol)

r2_at <- summary(fit_pol_2at)$r.squared
r2_at

[1] 0.1847925

8.4.2.3 Set A -PLUS- national news

fit_pol_2an <- lm(pknow ~ npnews + locnews + natnews,
                  data = df_pol)

r2_an <- summary(fit_pol_2an)$r.squared
r2_an

[1] 0.1394805

8.4.2.4 Set A -PLUS BOTH talk ratio AND national news

fit_pol_2atn <- lm(pknow ~ npnews + locnews + talkrad + natnews,
                 data = df_pol)

r2_atn  <- summary(fit_pol_2atn)$r.squared
r2_atn

[1] 0.1971275

8.4.3 Variable Importance

There are MANY measures of “Variable Importance”

Note: * 1 & 2 are not recomended * 2-4 may be refered to as ‘Effect Size’ in general

8.4.3.1 1. Sample Regression Coefficients, $b$

also called regression weights

Interpretation: $b$ is the average change in $Y$ for a 1-unit change in $X$
Pro: estimated in all regression models
Cons: scale dependent, so hard to compare predictors with different units

\[ b = r_{yx}\frac{s_y}{s_x} \tag{EQ 8-1} \]

For Model frt_pol_2atn:

coef(fit_pol_2atn)

(Intercept)      npnews     locnews     talkrad     natnews 
  8.5565731   0.4716805  -0.4440727   0.8324193   0.2086020

8.4.3.2 2. Standardized Regression Coefficients, $b^*$

Interpretation: $b^*$ is the average SD change in $Y$ for a 1-SD change in $X$ while holding all other predictors constant
Pros: scale-free measure
Cons: meaning less for categorical predictors, places all predictors on the same scale

Problem: if predictors are correlation, it is hard to justify changing one predictor while holding another constant

\[ b^* = b \frac{s_x}{s_y} \tag{EQ 8-2} \]

For Model frt_pol_2atn:

parameters::standardise_parameters(fit_pol_2atn)

# A tibble: 5 × 5
  Parameter   Std_Coefficient    CI  CI_low CI_high
  <chr>                 <dbl> <dbl>   <dbl>   <dbl>
1 (Intercept)       -8.58e-17  0.95 -0.0962  0.0962
2 npnews             3.04e- 1  0.95  0.205   0.403 
3 locnews           -2.24e- 1  0.95 -0.331  -0.118 
4 talkrad            2.45e- 1  0.95  0.147   0.343 
5 natnews            1.26e- 1  0.95  0.0168  0.236

8.4.3.3 3. Semi-partial Correlation, $sr$

Interpretation: $\eta^2$ is the proportion of TOTAL variance in $Y$ uniquely explained by $X$
Pros: adjusts for other covariates
Cons: dependent on what covariates are included

\[ B = \text{the variable of interest} \\ A = \text{set of all other variables} \\ R^2_{AB} = \text{proportion of variance accounted for by } A \text{ and } B \text{ together} \\ R^2_{A} = \text{proportion of variance accounted for by} A \\ sr_{B|A} = \text{semi-partial correlation for variable } B \text{ controlling for } A \\ \eta^2_{B|A} = \text{eta-squared for variable } B \text{ contorlling for } A \]

\[ sr^2_{B|A} = \eta^2_{B|A} = R^2_{AB} - R^2_A \]

Another formula:

\[ k = \text{number of predictors in } A \text{ and } B \\ N = \text{sample size} \\ t_B = \frac{b_B}{SE_B}\\ sr_B = t_B \sqrt{\frac{1-R^2_{AB}}{N - k - 1}} \]

For Model frt_pol_2atn:

DescTools::EtaSq(fit_pol_2atn) %>% 
  data.frame() %>% 
  dplyr::select(eta.sq) %>% 
  dplyr::mutate(sr = sqrt(eta.sq))

# A tibble: 4 × 2
  eta.sq    sr
   <dbl> <dbl>
1 0.0870 0.295
2 0.0412 0.203
3 0.0576 0.240
4 0.0123 0.111

8.4.3.4 4. Partial Correlation, $pr$

Interpretation: $\eta^2_p$ is the proportion of variance in $Y$ NOT explained by the other predictors, but uniquely explained by $X$
Pros: adjusts for other covariates
Cons: dependent on what covariates are included

\[ pr^2_{B|A} = \eta^2_{p(B|A)} = \frac{R^2_{AB} - R^2_A}{1 - R^2_A} \]

For Model frt_pol_2atn:

DescTools::EtaSq(fit_pol_2atn) %>% 
  data.frame() %>% 
  dplyr::select(eta.sq.part) %>% 
  dplyr::mutate(pr = sqrt(eta.sq.part))

# A tibble: 4 × 2
  eta.sq.part    pr
        <dbl> <dbl>
1      0.0978 0.313
2      0.0488 0.221
3      0.0670 0.259
4      0.0151 0.123

8.4.3.5 5. Cohen’s $f$-squared, $f^2$

Interpretation: for a single variable or set of variables $B$, it is the RATIO between the proportion of the variance in $Y$ UNIQUELY explained by $B$ AND the proportion of variance in $Y$ unexplained by ANY variable in the model ($A$ and $B$)
Con: ranges from 0 to infinity (no upper bound), can be greater than 1

Cohen has suggested that the values of 0.10, 0.25, and 0.40 represent small, medium, and large effect sizes, respectively.

\[ f^2_{B|A} = \frac{R^2_{AB} - R^2_A}{1 - R^2_{AB}} \tag{EQ 8-2} \]

Calculate for each predictor in a model. It says “Type I” since it works sequentially in the order the predictors are listed in the regression formula

In this case, each predictor is treated as $B$ while the predictors in the lines ABOVE it are treated as set $A$.

Cohen’s $f^2$ for talk radio added to set $A$ (newspaper and local news)

r2_at

[1] 0.1847925

r2_a

[1] 0.1141542

f2_a_t = (r2_at - r2_a)/(1 - r2_at)
f2_a_t

[1] 0.08665069

Cohen’s $f^2$ for national news added to set $A$ and talk radio

r2_at

[1] 0.1847925

r2_atn

[1] 0.1971275

f2_at_n = (r2_atn - r2_at)/(1 - r2_at)
f2_at_n

[1] 0.01513111

order: npnews + locnews + talkrad (f2_a_t) + natnews

lm(pknow ~ npnews + locnews + talkrad + natnews,
                 data = df_pol) %>% 
  effectsize::cohens_f_squared() %>% 
  print(digits = 3)

# Effect Size for ANOVA (Type I)

Parameter | Cohen's f2 (partial) |       95% CI
-----------------------------------------------
npnews    |                0.111 | [0.058, Inf]
locnews   |                0.031 | [0.007, Inf]
talkrad   |                0.088 | [0.042, Inf]
natnews   |                0.015 | [0.001, Inf]

- One-sided CIs: upper bound fixed at [Inf].

order: npnews + locnews + natnews + talkrad

lm(pknow ~ npnews + locnews + natnews + talkrad,
                 data = df_pol) %>% 
  effectsize::cohens_f_squared(partial = TRUE) %>% 
  print(digits = 3)

# Effect Size for ANOVA (Type I)

Parameter | Cohen's f2 (partial) |       95% CI
-----------------------------------------------
npnews    |                0.111 | [0.058, Inf]
locnews   |                0.031 | [0.007, Inf]
natnews   |                0.032 | [0.008, Inf]
talkrad   |                0.072 | [0.031, Inf]

- One-sided CIs: upper bound fixed at [Inf].

When comparing two models in a sequential regression analysis, Cohen’s $f$ for R-square change is the RATIO between the INCREASE in R-square and the percent of unexplained variance. Thus, the numerator of (EQ 8-2) reflects the proportion of variance uniquely accounted for by $B$, over and above that of all other variables (Cohen, 1988).

The variation of Cohen’s $f^2$ measuring local effect size is much more relevant to the research question where a single or set of variables ($B$) is added to a other variables (set $A$).

\[ f^2_{B|A} = \frac{R^2_{AB} - R^2_A}{1 - R^2_{AB}} = \frac{\Delta R^2}{1 - R^2_{AB}} \tag{EQ 8-2} \]

Effect of adding talk radio to set A

effectsize::cohens_f_squared(model  = fit_pol_2at, 
                             model2 = fit_pol_2a) %>% 
  print(digits = 3)

Cohen's f2 (partial) |       95% CI | R2_delta
----------------------------------------------
0.087                | [0.041, Inf] |    0.071

- One-sided CIs: upper bound fixed at [Inf].

Effect of adding national news to set A

effectsize::cohens_f_squared(model = fit_pol_2an, 
                             model2 = fit_pol_2a) %>% 
  print(digits = 3)

Cohen's f2 (partial) |       95% CI | R2_delta
----------------------------------------------
0.029                | [0.007, Inf] |    0.025

- One-sided CIs: upper bound fixed at [Inf].

Effect of adding national news to set A and talk radio

effectsize::cohens_f_squared(model  = fit_pol_2atn, 
                             model2 = fit_pol_2at) %>% 
  print(digits = 3)

Cohen's f2 (partial) |       95% CI | R2_delta
----------------------------------------------
0.015                | [0.001, Inf] |    0.012

- One-sided CIs: upper bound fixed at [Inf].

Effect of adding BOTH talk radio AND national news to set A

effectsize::cohens_f_squared(model  = fit_pol_2atn, 
                             model2 = fit_pol_2a) %>% 
  print(digits = 3)

Cohen's f2 (partial) |       95% CI | R2_delta
----------------------------------------------
0.103                | [0.050, Inf] |    0.083

- One-sided CIs: upper bound fixed at [Inf].

8.4.3.6 6. Standard Error of Estimate

Interpretation: the standard error of estimates (estimated or conditional means), smaller values indicate better models
Pros: weights the entire model, all predictors
Cons: dependent on what predictors are included

The standard error of estimate ($s_{Y|X} = \sqrt{MS_{residual}}$) is printed as a matter of routine by many regression programs. It is an estimator of the standard deviation of the errors in estimate.

As you know, means, regression coefficients, and other statistics have their own standard errors. These usually decline with sample size. But the standard error of estimate does not decline with increasing sample size, because we are estimating a value for each participant rather than a single value for the entire population.

$$ MSE_{A} = Y A\ s_{Y|A} = Y A \

s_{Y|A} $$

One way of measuring the quality of a prediction system is how large the errors in estimation tend to be. The standard error of estimate ﬁrst introduced in section 4.2.4 is widely used as a measure of this.

The SMALLER $s_{Y|X}$, the “BETTER” the model, in the sense that the model generates estimates of $Y$ that are closer to $Y$ than some other model of the same $Y$ with a bigger $s_{Y|X}$.

In a model with a single predictor $X$ of $Y$ , the standard error of estimate is related to $r_{XY}$ by the formula:

$$ s_{Y|X} = s_Y

In all models with “A” equal to the set of predictors in the regression, the standard error of estimate is given the this formula:

\[ s_{Y|A} = \sqrt{MS_{residual}} = \text{sigma}_{Y|A} \]

Predictors = Set A (newspaper and local news)

see_a <- summary(fit_pol_2a)$sigma
see_a

[1] 4.127982

In large samples, $s_{Y|X}$ is very close to the standard deviation of the residuals ($SD(residuals)$).

sd(residuals(fit_pol_2a))

[1] 4.115787

Predictors = Set A -PLUS- talk ratio

see_at <- summary(fit_pol_2at)$sigma
see_at

[1] 3.965867

Predictors = Set A -PLUS- national news

see_an <- summary(fit_pol_2an)$sigma
see_an

[1] 4.074595

Predictors = Set A -PLUS- national news AND talk ratio

see_atn <- summary(fit_pol_2atn)$sigma
see_atn

[1] 3.941619

8.4.3.7 7. Coefficient of Forecasting Efficiency

Interpretation: proportional reduction in the standard error of estimate when using the relationship between $X$ and $Y$
Pro: ranges between 0 and 1, higher = more important
Con: meaningful increase may be small

For a single predictor:

\[ E = 1 - \sqrt{1 - r^2_{XY}} \tag{1 predictor X} \]

For adding a predictor(s) $B$ to set $A$

\[ E_B = \frac{\sqrt{MSE_A} - \sqrt{MSE_{AB}}}{\sqrt{MSE_A}} \tag{B added to A} \]

Talk Radio

unadjusted correlation between talk radio and political knowledge ($Y$)

r_t <- cor(df_pol$pknow, df_pol$talkrad)
r_t

[1] 0.260923

proportion reduction in the standard error of estimate when using talk radio only

cfe_t <- 1 - sqrt(1 - r_t^2)
cfe_t

[1] 0.03464038

proportion reduction in the standard error of estimate when using talk radio, if already considering newspaper and local news

(see_a - see_at)/see_a

[1] 0.03927226

National News unadjusted correlation between national news and political knowledge

r_n <- cor(df_pol$pknow, df_pol$natnews)
r_n

[1] 0.1480993

proportion reduction in the standard error of estimate when using national news

cfe_n <- 1 - sqrt(1 - r_n^2)
cfe_n

[1] 0.0110275

proportion reduction in the standard error of estimate when using national news if already considering newspaper and local news

(see_a - see_an)/see_a

[1] 0.01293304

8.4.3.8 8. Change in R-squared

Interpretation: Change in Total variance in $Y$ explain when $X$ is added to the model
Con: dependent on what was previously in the model

With set A deﬁned as two regressors (npnes and locnews), there are $2^{4 − 2} = 4$ subsets of these two regressors. Those four sets can be found in the rows of Table 8.2. For each each subset, we calculate $R$ three times, regression $Y$ on:

just the variables in the A subset,
A subset plus regressor $i$ (talkrad),
A subset plus regressor $j$ (natnews),

Importantly, we do not calculate $R$ when both regressor $i$ and $j$ are in the model. With these computations done we can derive $\Delta R^2_i$ and $\Delta R^2_j$ in each of the four subsets. Table 8.2 shows these computations.

Recreating Table 8.2 found at the top of page 237 of the Darlington & Hayes textbook.

data.frame(A = c("None",
                 "Newspaper",
                 "Local News",
                 "Both"),
           base = c("pknow ~ 1", 
                    "pknow ~ npnews", 
                    "pknow ~ locnews", 
                    "pknow ~ npnews+locnews")) %>% 
  dplyr::mutate(add_talk = paste0(base, "+talkrad")) %>% 
  dplyr::mutate(add_natn = paste0(base, "+natnews")) %>% 
  dplyr::mutate(fit_base = purrr::map(base,
                                      ~lm(.x, data = df_pol))) %>%
  dplyr::mutate(fit_talk = purrr::map(add_talk,
                                      ~lm(.x, data = df_pol))) %>%
  dplyr::mutate(fit_natn = purrr::map(add_natn,
                                      ~lm(.x, data = df_pol))) %>%
  dplyr::mutate(R2 = purrr::map_dbl(fit_base,
                                    ~ broom::glance(.x)$r.squared)) %>% 
  dplyr::mutate(R2i = purrr::map_dbl(fit_talk,
                                     ~ broom::glance(.x)$r.squared)) %>% 
  dplyr::mutate(R2j = purrr::map_dbl(fit_natn,
                                     ~ broom::glance(.x)$r.squared)) %>% 
  dplyr::mutate(R  = sqrt(R2)) %>% 
  dplyr::mutate(Ri = sqrt(R2i)) %>% 
  dplyr::mutate(Rj = sqrt(R2j)) %>% 
  dplyr::mutate(dRi = Ri - R) %>% 
  dplyr::mutate(dRj = Rj - R) %>% 
  dplyr::select("Set A subset" = A,
                "R" = R, 
                "Adding i\nTalk Radio\nR" = Ri,
                "Adding j\nNational News\nR" = Rj,
                "Talk Radio, i\nChange\nin R" = Ri,
                "National News, j\nChange\nin R" = Rj) %>% 
  flextable::flextable() %>% 
  apaSupp::theme_apa(caption = "D&H Table 8.2 - Relative Improvement in Fit for Dominance Computations",
                     d = 3)

**Table 8.7**
D&H Table 8.2 - Relative Improvement in Fit for Dominance Computations
Set A subset	R	Adding i Talk Radio R	Adding j National News R	Talk Radio, i Change in R	National News, j Change in R
None	0.000	0.261	0.148	0.261	0.148
Newspaper	0.298	0.393	0.310	0.393	0.310
Local News	0.106	0.288	0.237	0.288	0.237
Both	0.338	0.430	0.373	0.430	0.373

As can be seen, in all four models (ROWS) deﬁned by subsets of newspaper reading and local news use, adding talk radio use to the model INCREASES $R$ MORE than does watching the national network news.

Never does the addition of watching the national network news improve model ﬁt more than listening to political talk radio. So talk radio use completely dominates watching the national network news in explaining variation in political knowledge.

8.4.4 Relative Improvement in Fit

The calc.relimp() function in the relaimpo package calculates several relative importance metrics for the linear model. The recommended metrics is type = "lmg" ($R^2$ partitioned by averaging over orders, like in Lindemann, Merenda and Gold, 1980, p.119). For completeness and comparison purposes, several other metrics are also on offer (cf. e.g. Darlington (1968)).

fit_pol_2atn %>% 
  relaimpo::calc.relimp(type = "lmg", importance = TRUE)

Response variable: pknow 
Total response variance: 19.12264 
Analysis based on 340 observations 

4 Regressors: 
npnews locnews talkrad natnews 
Proportion of variance explained by model: 19.71%
Metrics are not normalized (rela=FALSE). 

Relative importance metrics: 

               lmg
npnews  0.08787767
locnews 0.02812186
talkrad 0.06292808
natnews 0.01819988

Average coefficients for different model sizes: 

                1X        2Xs        3Xs        4Xs
npnews   0.4629151  0.4650344  0.4695145  0.4716805
locnews -0.2108792 -0.3214204 -0.3986934 -0.4440727
talkrad  0.8869822  0.8677825  0.8527704  0.8324193
natnews  0.2444027  0.2313024  0.2191853  0.2086020

8.5 Dominance Ananlysis

see: https://cran.r-project.org/web/packages/domir/vignettes/domir_basics.html

domir(
  pknow ~ npnews + locnews + talkrad + natnews, 
  function(formula) {
    lm_model <- lm(formula, data = df_pol)
    summary(lm_model)[["r.squared"]]
  }
)

Overall Value:      0.1971275 

General Dominance Values:
        General Dominance Standardized Ranks
npnews         0.08787767   0.44579104     1
locnews        0.02812186   0.14265824     3
talkrad        0.06292808   0.31922529     2
natnews        0.01819988   0.09232542     4

Conditional Dominance Values:
        Include At: 1 Include At: 2 Include At: 3 Include At: 4
npnews     0.08896006    0.08783655    0.08769250    0.08702156
locnews    0.01133509    0.02477780    0.03521980    0.04115476
talkrad    0.06808081    0.06448014    0.06150433    0.05764703
natnews    0.02193340    0.02061423    0.01791688    0.01233499

Complete Dominance Proportions:
          > npnews > locnews > talkrad > natnews
npnews >        NA      1.00         1      1.00
locnews >        0        NA         0      0.75
talkrad >        0      1.00        NA      1.00
natnews >        0      0.25         0        NA

Encyclopedia of Quantitative Methods in R, vol. 4: Multiple Linear Regression

8 D&H Ch8 - Regressor Importance: “politics”

8.1 PURPOSE

8.1.1 Research Questions

8.1.2 Data Import

8.1.3 Data Description

8.1.3.1 Variables

8.2 RQ1) Fit Main Model

8.2.1 Interpretation

8.3 RQ2) Parameter Equivalence

8.3.1 Interpretation

8.4 RQ3) Variable Contribution

8.4.1 Relavent Statistics

8.4.1.1 Mean of the dependent variable:

8.4.1.2 Sample standard deviations, \(s_Y\) and \(s_X\)’s

8.4.1.3 Pairwise Correlations

8.4.1.4 Predicted Values

8.4.1.5 Sum of Squares

8.4.1.6 Degrees of Freedom

8.4.1.7 Mean Squares

8.4.1.8 Coefficient of Determination, \(R^2\)

8.4.2 Fit a Sequence of Models

8.4.2.1 Set A predictors: newspaper and local news

8.4.2.2 Set A -PLUS- talk radio

8.4.2.3 Set A -PLUS- national news

8.4.2.4 Set A -PLUS BOTH talk ratio AND national news

8.4.3 Variable Importance

8.4.3.1 1. Sample Regression Coefficients, \(b\)

8.4.3.2 2. Standardized Regression Coefficients, \(b^*\)

8.4.3.3 3. Semi-partial Correlation, \(sr\)

8.4.3.4 4. Partial Correlation, \(pr\)

8.4.3.5 5. Cohen’s \(f\)-squared, \(f^2\)

8.4.3.6 6. Standard Error of Estimate

8.4.3.7 7. Coefficient of Forecasting Efficiency

8.4.3.8 8. Change in R-squared

8.4.4 Relative Improvement in Fit

8.5 Dominance Ananlysis