7 D&H Ch7 - Regression for Prediction: “mtcars”
Darlington & Hayes, Chapter 7’s example
# install.packages("remotes")
# remotes::install_github("sarbearschwartz/apaSupp")
# remotes::install_github("ddsjoberg/gtsummary")
library(tidyverse)
library(flextable)
library(apaSupp)
library(car)
library(rempsyc)
library(parameters)
library(performance)
library(interactions)
library(ggResidpanel)
7.1 PURPOSE
RESEARCH QUESTION:
What variables predict fuel efficiency?
7.1.1 Data Description
Five candidate predictors of mpg
(miles per gallon)
disp
displacement, cubic incheshp
gross hoursepowerdrat
rear axle ratiowt
weight, 1000s poundsqsec
1/4 mile time, seconds
# A tibble: 32 × 11
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
# ℹ 22 more rows
7.2 REGRESSION ANALYSIS
Call:
lm(formula = mpg ~ disp + hp + drat + wt + qsec, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.5404 -1.6701 -0.4264 1.1320 5.4996
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.53357 10.96423 1.508 0.14362
disp 0.00872 0.01119 0.779 0.44281
hp -0.02060 0.01528 -1.348 0.18936
drat 2.01578 1.30946 1.539 0.13579
wt -4.38546 1.24343 -3.527 0.00158 **
qsec 0.64015 0.45934 1.394 0.17523
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.558 on 26 degrees of freedom
Multiple R-squared: 0.8489, Adjusted R-squared: 0.8199
F-statistic: 29.22 on 5 and 26 DF, p-value: 6.892e-10
7.3 AUTOMATED VARIABLE SELECTION
https://olsrr.rsquaredacademy.com/articles/variable_selection
7.3.1 All Possible Subsets
Index N Predictors R-Square Adj. R-Square Mallow's Cp
4 1 1 wt 0.7528328 0.7445939 14.534575
1 2 1 disp 0.7183433 0.7089548 20.469805
2 3 1 hp 0.6024373 0.5891853 40.415867
3 4 1 drat 0.4639952 0.4461283 64.240140
5 5 1 qsec 0.1752963 0.1478062 113.921823
11 6 2 hp wt 0.8267855 0.8148396 3.808191
15 7 2 wt qsec 0.8264161 0.8144448 3.871747
8 8 2 disp wt 0.7809306 0.7658223 11.699275
13 9 2 drat wt 0.7608970 0.7444071 15.146825
6 10 2 disp hp 0.7482402 0.7308774 17.324909
10 11 2 hp drat 0.7411716 0.7233214 18.541329
7 12 2 disp drat 0.7310094 0.7124583 20.290124
9 13 2 disp qsec 0.7215598 0.7023571 21.916283
12 14 2 hp qsec 0.6368769 0.6118339 36.489228
14 15 2 drat qsec 0.5921951 0.5640706 44.178433
25 16 3 drat wt qsec 0.8370214 0.8195594 4.046701
22 17 3 hp drat wt 0.8368791 0.8194018 4.071201
24 18 3 hp wt qsec 0.8347678 0.8170643 4.434529
17 19 3 disp hp wt 0.8268361 0.8082829 5.799467
21 20 3 disp wt qsec 0.8264170 0.8078189 5.871590
19 21 3 disp drat wt 0.7835315 0.7603385 13.251685
16 22 3 disp hp drat 0.7750131 0.7509073 14.717610
18 23 3 disp hp qsec 0.7541953 0.7278591 18.300105
23 24 3 hp drat qsec 0.7442512 0.7168495 20.011366
20 25 3 disp drat qsec 0.7412673 0.7135459 20.524864
30 26 4 hp drat wt qsec 0.8453853 0.8224794 4.607381
29 27 4 disp drat wt qsec 0.8383592 0.8144124 5.816483
26 28 4 disp hp drat wt 0.8376289 0.8135739 5.942167
28 29 4 disp hp wt qsec 0.8351443 0.8107212 6.369735
27 30 4 disp hp drat qsec 0.7766318 0.7435402 16.439042
31 31 5 disp hp drat wt qsec 0.8489147 0.8198599 6.000000
7.3.2 Best Subsets
Best Subsets Regression
-----------------------------------
Model Index Predictors
-----------------------------------
1 wt
2 hp wt
3 drat wt qsec
4 hp drat wt qsec
5 disp hp drat wt qsec
-----------------------------------
Subsets Regression Summary
---------------------------------------------------------------------------------------------------------------------------------
Adj. Pred
Model R-Square R-Square R-Square C(p) AIC SBIC SBC MSEP FPE HSP APC
---------------------------------------------------------------------------------------------------------------------------------
1 0.7528 0.7446 0.7087 14.5346 166.0294 74.1040 170.4266 296.9167 9.8572 0.3199 0.2801
2 0.8268 0.8148 0.7811 3.8082 156.6523 66.2706 162.5153 215.5104 7.3563 0.2402 0.2091
3 0.8370 0.8196 0.7765 4.0467 156.7031 66.9790 164.0318 210.2851 7.3736 0.2428 0.2095
4 0.8454 0.8225 0.7804 4.6074 157.0173 68.1498 165.8117 207.1664 7.4558 0.2480 0.2119
5 0.8489 0.8199 0.7666 6.0000 158.2784 70.1290 168.5385 210.5348 7.7703 0.2617 0.2208
---------------------------------------------------------------------------------------------------------------------------------
AIC: Akaike Information Criteria
SBIC: Sawa's Bayesian Information Criteria
SBC: Schwarz Bayesian Criteria
MSEP: Estimated error of prediction, assuming multivariate normality
FPE: Final Prediction Error
HSP: Hocking's Sp
APC: Amemiya Prediction Criteria
7.3.3 Forward Stepwise, p-value
Stepwise Summary
-------------------------------------------------------------------------
Step Variable AIC SBC SBIC R2 Adj. R2
-------------------------------------------------------------------------
0 Base Model 208.756 211.687 114.990 0.00000 0.00000
1 wt 166.029 170.427 74.104 0.75283 0.74459
2 hp 156.652 162.515 66.271 0.82679 0.81484
3 drat 156.731 164.060 67.000 0.83688 0.81940
4 qsec 157.017 165.812 68.150 0.84539 0.82248
-------------------------------------------------------------------------
Final Model Output
------------------
Model Summary
---------------------------------------------------------------
R 0.919 RMSE 2.333
R-Squared 0.845 MSE 5.441
Adj. R-Squared 0.822 Coef. Var 12.639
Pred R-Squared 0.780 AIC 157.017
MAE 1.852 SBC 165.812
---------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
--------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
--------------------------------------------------------------------
Regression 951.944 4 237.986 36.907 0.0000
Residual 174.103 27 6.448
Total 1126.047 31
--------------------------------------------------------------------
Parameter Estimates
----------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
----------------------------------------------------------------------------------------
(Intercept) 19.260 10.315 1.867 0.073 -1.906 40.425
wt -3.708 0.882 -0.602 -4.202 0.000 -5.518 -1.897
hp -0.018 0.015 -0.203 -1.209 0.237 -0.048 0.012
drat 1.657 1.217 0.147 1.362 0.185 -0.840 4.154
qsec 0.528 0.433 0.156 1.219 0.233 -0.361 1.416
----------------------------------------------------------------------------------------
7.3.4 Backwards Stepwise, p-value
Stepwise Summary
------------------------------------------------------------------------
Step Variable AIC SBC SBIC R2 Adj. R2
------------------------------------------------------------------------
0 Full Model 158.278 168.539 70.129 0.84891 0.81986
1 disp 157.017 165.812 68.150 0.84539 0.82248
------------------------------------------------------------------------
Final Model Output
------------------
Model Summary
---------------------------------------------------------------
R 0.919 RMSE 2.333
R-Squared 0.845 MSE 5.441
Adj. R-Squared 0.822 Coef. Var 12.639
Pred R-Squared 0.780 AIC 157.017
MAE 1.852 SBC 165.812
---------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
--------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
--------------------------------------------------------------------
Regression 951.944 4 237.986 36.907 0.0000
Residual 174.103 27 6.448
Total 1126.047 31
--------------------------------------------------------------------
Parameter Estimates
----------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
----------------------------------------------------------------------------------------
(Intercept) 19.260 10.315 1.867 0.073 -1.906 40.425
hp -0.018 0.015 -0.203 -1.209 0.237 -0.048 0.012
drat 1.657 1.217 0.147 1.362 0.185 -0.840 4.154
wt -3.708 0.882 -0.602 -4.202 0.000 -5.518 -1.897
qsec 0.528 0.433 0.156 1.219 0.233 -0.361 1.416
----------------------------------------------------------------------------------------
7.3.5 Forward Stepwise, R-squared
Stepwise Summary
-------------------------------------------------------------------------
Step Variable AIC SBC SBIC R2 Adj. R2
-------------------------------------------------------------------------
0 Base Model 208.756 211.687 114.990 0.00000 0.00000
1 wt 166.029 170.427 74.104 0.75283 0.74459
2 hp 156.652 162.515 66.271 0.82679 0.81484
3 drat 156.731 164.060 67.000 0.83688 0.81940
4 qsec 157.017 165.812 68.150 0.84539 0.82248
5 disp 158.278 168.539 70.129 0.84891 0.81986
-------------------------------------------------------------------------
Final Model Output
------------------
Model Summary
---------------------------------------------------------------
R 0.921 RMSE 2.306
R-Squared 0.849 MSE 5.317
Adj. R-Squared 0.820 Coef. Var 12.732
Pred R-Squared 0.767 AIC 158.278
MAE 1.840 SBC 168.539
---------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
--------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
--------------------------------------------------------------------
Regression 955.918 5 191.184 29.218 0.0000
Residual 170.129 26 6.543
Total 1126.047 31
--------------------------------------------------------------------
Parameter Estimates
----------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
----------------------------------------------------------------------------------------
(Intercept) 16.534 10.964 1.508 0.144 -6.004 39.071
wt -4.385 1.243 -0.712 -3.527 0.002 -6.941 -1.830
hp -0.021 0.015 -0.234 -1.348 0.189 -0.052 0.011
drat 2.016 1.309 0.179 1.539 0.136 -0.676 4.707
qsec 0.640 0.459 0.190 1.394 0.175 -0.304 1.584
disp 0.009 0.011 0.179 0.779 0.443 -0.014 0.032
----------------------------------------------------------------------------------------
7.3.6 Forward Stepwise, Adjusted R-squared
Stepwise Summary
-------------------------------------------------------------------------
Step Variable AIC SBC SBIC R2 Adj. R2
-------------------------------------------------------------------------
0 Base Model 208.756 211.687 114.990 0.00000 0.00000
1 wt 166.029 170.427 74.104 0.75283 0.74459
2 hp 156.652 162.515 66.271 0.82679 0.81484
3 drat 156.731 164.060 67.000 0.83688 0.81940
4 qsec 157.017 165.812 68.150 0.84539 0.82248
-------------------------------------------------------------------------
Final Model Output
------------------
Model Summary
---------------------------------------------------------------
R 0.919 RMSE 2.333
R-Squared 0.845 MSE 5.441
Adj. R-Squared 0.822 Coef. Var 12.639
Pred R-Squared 0.780 AIC 157.017
MAE 1.852 SBC 165.812
---------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
--------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
--------------------------------------------------------------------
Regression 951.944 4 237.986 36.907 0.0000
Residual 174.103 27 6.448
Total 1126.047 31
--------------------------------------------------------------------
Parameter Estimates
----------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
----------------------------------------------------------------------------------------
(Intercept) 19.260 10.315 1.867 0.073 -1.906 40.425
wt -3.708 0.882 -0.602 -4.202 0.000 -5.518 -1.897
hp -0.018 0.015 -0.203 -1.209 0.237 -0.048 0.012
drat 1.657 1.217 0.147 1.362 0.185 -0.840 4.154
qsec 0.528 0.433 0.156 1.219 0.233 -0.361 1.416
----------------------------------------------------------------------------------------
7.3.7 Forward Stepwise, AIC
Stepwise Summary
-------------------------------------------------------------------------
Step Variable AIC SBC SBIC R2 Adj. R2
-------------------------------------------------------------------------
0 Base Model 208.756 211.687 114.990 0.00000 0.00000
1 wt 166.029 170.427 74.104 0.75283 0.74459
2 hp 156.652 162.515 66.271 0.82679 0.81484
-------------------------------------------------------------------------
Final Model Output
------------------
Model Summary
---------------------------------------------------------------
R 0.909 RMSE 2.469
R-Squared 0.827 MSE 6.095
Adj. R-Squared 0.815 Coef. Var 12.909
Pred R-Squared 0.781 AIC 156.652
MAE 1.901 SBC 162.515
---------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
--------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
--------------------------------------------------------------------
Regression 930.999 2 465.500 69.211 0.0000
Residual 195.048 29 6.726
Total 1126.047 31
--------------------------------------------------------------------
Parameter Estimates
----------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
----------------------------------------------------------------------------------------
(Intercept) 37.227 1.599 23.285 0.000 33.957 40.497
wt -3.878 0.633 -0.630 -6.129 0.000 -5.172 -2.584
hp -0.032 0.009 -0.361 -3.519 0.001 -0.050 -0.013
----------------------------------------------------------------------------------------
7.3.8 Forward Stepwise, Schwartz Bayesian Criterion
Stepwise Summary
-------------------------------------------------------------------------
Step Variable AIC SBC SBIC R2 Adj. R2
-------------------------------------------------------------------------
0 Base Model 208.756 211.687 114.990 0.00000 0.00000
1 wt 166.029 170.427 74.104 0.75283 0.74459
2 hp 156.652 162.515 66.271 0.82679 0.81484
-------------------------------------------------------------------------
Final Model Output
------------------
Model Summary
---------------------------------------------------------------
R 0.909 RMSE 2.469
R-Squared 0.827 MSE 6.095
Adj. R-Squared 0.815 Coef. Var 12.909
Pred R-Squared 0.781 AIC 156.652
MAE 1.901 SBC 162.515
---------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
--------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
--------------------------------------------------------------------
Regression 930.999 2 465.500 69.211 0.0000
Residual 195.048 29 6.726
Total 1126.047 31
--------------------------------------------------------------------
Parameter Estimates
----------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
----------------------------------------------------------------------------------------
(Intercept) 37.227 1.599 23.285 0.000 33.957 40.497
wt -3.878 0.633 -0.630 -6.129 0.000 -5.172 -2.584
hp -0.032 0.009 -0.361 -3.519 0.001 -0.050 -0.013
----------------------------------------------------------------------------------------