7 D&H Ch7 - Regression for Prediction: “mtcars”

Darlington & Hayes, Chapter 7’s example

# install.packages("remotes")
# remotes::install_github("sarbearschwartz/apaSupp")
# remotes::install_github("ddsjoberg/gtsummary")
     
library(tidyverse) 
library(flextable)
library(apaSupp)
library(car)
library(rempsyc)
library(parameters)
library(performance)
library(interactions)
library(ggResidpanel)
flextable::set_flextable_defaults(digits = 2)

7.1 PURPOSE

RESEARCH QUESTION:

What variables predict fuel efficiency?

7.1.1 Data Description

Five candidate predictors of mpg (miles per gallon)

  • disp displacement, cubic inches
  • hp gross hoursepower
  • drat rear axle ratio
  • wt weight, 1000s pounds
  • qsec 1/4 mile time, seconds
mtcars
# A tibble: 32 × 11
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
# ℹ 22 more rows

7.2 REGRESSION ANALYSIS

fit_mtcars_1 <- lm(mpg ~ disp + hp + drat + wt + qsec, 
                   data = mtcars)

summary(fit_mtcars_1)

Call:
lm(formula = mpg ~ disp + hp + drat + wt + qsec, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5404 -1.6701 -0.4264  1.1320  5.4996 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept) 16.53357   10.96423   1.508  0.14362   
disp         0.00872    0.01119   0.779  0.44281   
hp          -0.02060    0.01528  -1.348  0.18936   
drat         2.01578    1.30946   1.539  0.13579   
wt          -4.38546    1.24343  -3.527  0.00158 **
qsec         0.64015    0.45934   1.394  0.17523   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.558 on 26 degrees of freedom
Multiple R-squared:  0.8489,    Adjusted R-squared:  0.8199 
F-statistic: 29.22 on 5 and 26 DF,  p-value: 6.892e-10

7.3 AUTOMATED VARIABLE SELECTION

https://olsrr.rsquaredacademy.com/articles/variable_selection

7.3.1 All Possible Subsets

fit_mtcars_1_all <- olsrr::ols_step_all_possible(fit_mtcars_1)

fit_mtcars_1_all
   Index N           Predictors  R-Square Adj. R-Square Mallow's Cp
4      1 1                   wt 0.7528328     0.7445939   14.534575
1      2 1                 disp 0.7183433     0.7089548   20.469805
2      3 1                   hp 0.6024373     0.5891853   40.415867
3      4 1                 drat 0.4639952     0.4461283   64.240140
5      5 1                 qsec 0.1752963     0.1478062  113.921823
11     6 2                hp wt 0.8267855     0.8148396    3.808191
15     7 2              wt qsec 0.8264161     0.8144448    3.871747
8      8 2              disp wt 0.7809306     0.7658223   11.699275
13     9 2              drat wt 0.7608970     0.7444071   15.146825
6     10 2              disp hp 0.7482402     0.7308774   17.324909
10    11 2              hp drat 0.7411716     0.7233214   18.541329
7     12 2            disp drat 0.7310094     0.7124583   20.290124
9     13 2            disp qsec 0.7215598     0.7023571   21.916283
12    14 2              hp qsec 0.6368769     0.6118339   36.489228
14    15 2            drat qsec 0.5921951     0.5640706   44.178433
25    16 3         drat wt qsec 0.8370214     0.8195594    4.046701
22    17 3           hp drat wt 0.8368791     0.8194018    4.071201
24    18 3           hp wt qsec 0.8347678     0.8170643    4.434529
17    19 3           disp hp wt 0.8268361     0.8082829    5.799467
21    20 3         disp wt qsec 0.8264170     0.8078189    5.871590
19    21 3         disp drat wt 0.7835315     0.7603385   13.251685
16    22 3         disp hp drat 0.7750131     0.7509073   14.717610
18    23 3         disp hp qsec 0.7541953     0.7278591   18.300105
23    24 3         hp drat qsec 0.7442512     0.7168495   20.011366
20    25 3       disp drat qsec 0.7412673     0.7135459   20.524864
30    26 4      hp drat wt qsec 0.8453853     0.8224794    4.607381
29    27 4    disp drat wt qsec 0.8383592     0.8144124    5.816483
26    28 4      disp hp drat wt 0.8376289     0.8135739    5.942167
28    29 4      disp hp wt qsec 0.8351443     0.8107212    6.369735
27    30 4    disp hp drat qsec 0.7766318     0.7435402   16.439042
31    31 5 disp hp drat wt qsec 0.8489147     0.8198599    6.000000
plot(fit_mtcars_1_all)

7.3.2 Best Subsets

olsrr::ols_step_best_subset(fit_mtcars_1)
      Best Subsets Regression      
-----------------------------------
Model Index    Predictors
-----------------------------------
     1         wt                   
     2         hp wt                
     3         drat wt qsec         
     4         hp drat wt qsec      
     5         disp hp drat wt qsec 
-----------------------------------

                                                   Subsets Regression Summary                                                    
---------------------------------------------------------------------------------------------------------------------------------
                       Adj.        Pred                                                                                           
Model    R-Square    R-Square    R-Square     C(p)        AIC        SBIC        SBC         MSEP       FPE       HSP       APC  
---------------------------------------------------------------------------------------------------------------------------------
  1        0.7528      0.7446      0.7087    14.5346    166.0294    74.1040    170.4266    296.9167    9.8572    0.3199    0.2801 
  2        0.8268      0.8148      0.7811     3.8082    156.6523    66.2706    162.5153    215.5104    7.3563    0.2402    0.2091 
  3        0.8370      0.8196      0.7765     4.0467    156.7031    66.9790    164.0318    210.2851    7.3736    0.2428    0.2095 
  4        0.8454      0.8225      0.7804     4.6074    157.0173    68.1498    165.8117    207.1664    7.4558    0.2480    0.2119 
  5        0.8489      0.8199      0.7666     6.0000    158.2784    70.1290    168.5385    210.5348    7.7703    0.2617    0.2208 
---------------------------------------------------------------------------------------------------------------------------------
AIC: Akaike Information Criteria 
 SBIC: Sawa's Bayesian Information Criteria 
 SBC: Schwarz Bayesian Criteria 
 MSEP: Estimated error of prediction, assuming multivariate normality 
 FPE: Final Prediction Error 
 HSP: Hocking's Sp 
 APC: Amemiya Prediction Criteria 

7.3.3 Forward Stepwise, p-value

olsrr::ols_step_forward_p(fit_mtcars_1)

                             Stepwise Summary                              
-------------------------------------------------------------------------
Step    Variable        AIC        SBC       SBIC        R2       Adj. R2 
-------------------------------------------------------------------------
 0      Base Model    208.756    211.687    114.990    0.00000    0.00000 
 1      wt            166.029    170.427     74.104    0.75283    0.74459 
 2      hp            156.652    162.515     66.271    0.82679    0.81484 
 3      drat          156.731    164.060     67.000    0.83688    0.81940 
 4      qsec          157.017    165.812     68.150    0.84539    0.82248 
-------------------------------------------------------------------------

Final Model Output 
------------------

                         Model Summary                          
---------------------------------------------------------------
R                       0.919       RMSE                 2.333 
R-Squared               0.845       MSE                  5.441 
Adj. R-Squared          0.822       Coef. Var           12.639 
Pred R-Squared          0.780       AIC                157.017 
MAE                     1.852       SBC                165.812 
---------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 
 AIC: Akaike Information Criteria 
 SBC: Schwarz Bayesian Criteria 

                               ANOVA                                 
--------------------------------------------------------------------
                Sum of                                              
               Squares        DF    Mean Square      F         Sig. 
--------------------------------------------------------------------
Regression     951.944         4        237.986    36.907    0.0000 
Residual       174.103        27          6.448                     
Total         1126.047        31                                    
--------------------------------------------------------------------

                                  Parameter Estimates                                    
----------------------------------------------------------------------------------------
      model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
----------------------------------------------------------------------------------------
(Intercept)    19.260        10.315                  1.867    0.073    -1.906    40.425 
         wt    -3.708         0.882       -0.602    -4.202    0.000    -5.518    -1.897 
         hp    -0.018         0.015       -0.203    -1.209    0.237    -0.048     0.012 
       drat     1.657         1.217        0.147     1.362    0.185    -0.840     4.154 
       qsec     0.528         0.433        0.156     1.219    0.233    -0.361     1.416 
----------------------------------------------------------------------------------------

7.3.4 Backwards Stepwise, p-value

olsrr::ols_step_backward_p(fit_mtcars_1)

                             Stepwise Summary                             
------------------------------------------------------------------------
Step    Variable        AIC        SBC       SBIC       R2       Adj. R2 
------------------------------------------------------------------------
 0      Full Model    158.278    168.539    70.129    0.84891    0.81986 
 1      disp          157.017    165.812    68.150    0.84539    0.82248 
------------------------------------------------------------------------

Final Model Output 
------------------

                         Model Summary                          
---------------------------------------------------------------
R                       0.919       RMSE                 2.333 
R-Squared               0.845       MSE                  5.441 
Adj. R-Squared          0.822       Coef. Var           12.639 
Pred R-Squared          0.780       AIC                157.017 
MAE                     1.852       SBC                165.812 
---------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 
 AIC: Akaike Information Criteria 
 SBC: Schwarz Bayesian Criteria 

                               ANOVA                                 
--------------------------------------------------------------------
                Sum of                                              
               Squares        DF    Mean Square      F         Sig. 
--------------------------------------------------------------------
Regression     951.944         4        237.986    36.907    0.0000 
Residual       174.103        27          6.448                     
Total         1126.047        31                                    
--------------------------------------------------------------------

                                  Parameter Estimates                                    
----------------------------------------------------------------------------------------
      model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
----------------------------------------------------------------------------------------
(Intercept)    19.260        10.315                  1.867    0.073    -1.906    40.425 
         hp    -0.018         0.015       -0.203    -1.209    0.237    -0.048     0.012 
       drat     1.657         1.217        0.147     1.362    0.185    -0.840     4.154 
         wt    -3.708         0.882       -0.602    -4.202    0.000    -5.518    -1.897 
       qsec     0.528         0.433        0.156     1.219    0.233    -0.361     1.416 
----------------------------------------------------------------------------------------

7.3.5 Forward Stepwise, R-squared

olsrr::ols_step_forward_r2(fit_mtcars_1)

                             Stepwise Summary                              
-------------------------------------------------------------------------
Step    Variable        AIC        SBC       SBIC        R2       Adj. R2 
-------------------------------------------------------------------------
 0      Base Model    208.756    211.687    114.990    0.00000    0.00000 
 1      wt            166.029    170.427     74.104    0.75283    0.74459 
 2      hp            156.652    162.515     66.271    0.82679    0.81484 
 3      drat          156.731    164.060     67.000    0.83688    0.81940 
 4      qsec          157.017    165.812     68.150    0.84539    0.82248 
 5      disp          158.278    168.539     70.129    0.84891    0.81986 
-------------------------------------------------------------------------

Final Model Output 
------------------

                         Model Summary                          
---------------------------------------------------------------
R                       0.921       RMSE                 2.306 
R-Squared               0.849       MSE                  5.317 
Adj. R-Squared          0.820       Coef. Var           12.732 
Pred R-Squared          0.767       AIC                158.278 
MAE                     1.840       SBC                168.539 
---------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 
 AIC: Akaike Information Criteria 
 SBC: Schwarz Bayesian Criteria 

                               ANOVA                                 
--------------------------------------------------------------------
                Sum of                                              
               Squares        DF    Mean Square      F         Sig. 
--------------------------------------------------------------------
Regression     955.918         5        191.184    29.218    0.0000 
Residual       170.129        26          6.543                     
Total         1126.047        31                                    
--------------------------------------------------------------------

                                  Parameter Estimates                                    
----------------------------------------------------------------------------------------
      model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
----------------------------------------------------------------------------------------
(Intercept)    16.534        10.964                  1.508    0.144    -6.004    39.071 
         wt    -4.385         1.243       -0.712    -3.527    0.002    -6.941    -1.830 
         hp    -0.021         0.015       -0.234    -1.348    0.189    -0.052     0.011 
       drat     2.016         1.309        0.179     1.539    0.136    -0.676     4.707 
       qsec     0.640         0.459        0.190     1.394    0.175    -0.304     1.584 
       disp     0.009         0.011        0.179     0.779    0.443    -0.014     0.032 
----------------------------------------------------------------------------------------

7.3.6 Forward Stepwise, Adjusted R-squared

olsrr::ols_step_forward_adj_r2(fit_mtcars_1)

                             Stepwise Summary                              
-------------------------------------------------------------------------
Step    Variable        AIC        SBC       SBIC        R2       Adj. R2 
-------------------------------------------------------------------------
 0      Base Model    208.756    211.687    114.990    0.00000    0.00000 
 1      wt            166.029    170.427     74.104    0.75283    0.74459 
 2      hp            156.652    162.515     66.271    0.82679    0.81484 
 3      drat          156.731    164.060     67.000    0.83688    0.81940 
 4      qsec          157.017    165.812     68.150    0.84539    0.82248 
-------------------------------------------------------------------------

Final Model Output 
------------------

                         Model Summary                          
---------------------------------------------------------------
R                       0.919       RMSE                 2.333 
R-Squared               0.845       MSE                  5.441 
Adj. R-Squared          0.822       Coef. Var           12.639 
Pred R-Squared          0.780       AIC                157.017 
MAE                     1.852       SBC                165.812 
---------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 
 AIC: Akaike Information Criteria 
 SBC: Schwarz Bayesian Criteria 

                               ANOVA                                 
--------------------------------------------------------------------
                Sum of                                              
               Squares        DF    Mean Square      F         Sig. 
--------------------------------------------------------------------
Regression     951.944         4        237.986    36.907    0.0000 
Residual       174.103        27          6.448                     
Total         1126.047        31                                    
--------------------------------------------------------------------

                                  Parameter Estimates                                    
----------------------------------------------------------------------------------------
      model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
----------------------------------------------------------------------------------------
(Intercept)    19.260        10.315                  1.867    0.073    -1.906    40.425 
         wt    -3.708         0.882       -0.602    -4.202    0.000    -5.518    -1.897 
         hp    -0.018         0.015       -0.203    -1.209    0.237    -0.048     0.012 
       drat     1.657         1.217        0.147     1.362    0.185    -0.840     4.154 
       qsec     0.528         0.433        0.156     1.219    0.233    -0.361     1.416 
----------------------------------------------------------------------------------------

7.3.7 Forward Stepwise, AIC

olsrr::ols_step_forward_aic(fit_mtcars_1)

                             Stepwise Summary                              
-------------------------------------------------------------------------
Step    Variable        AIC        SBC       SBIC        R2       Adj. R2 
-------------------------------------------------------------------------
 0      Base Model    208.756    211.687    114.990    0.00000    0.00000 
 1      wt            166.029    170.427     74.104    0.75283    0.74459 
 2      hp            156.652    162.515     66.271    0.82679    0.81484 
-------------------------------------------------------------------------

Final Model Output 
------------------

                         Model Summary                          
---------------------------------------------------------------
R                       0.909       RMSE                 2.469 
R-Squared               0.827       MSE                  6.095 
Adj. R-Squared          0.815       Coef. Var           12.909 
Pred R-Squared          0.781       AIC                156.652 
MAE                     1.901       SBC                162.515 
---------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 
 AIC: Akaike Information Criteria 
 SBC: Schwarz Bayesian Criteria 

                               ANOVA                                 
--------------------------------------------------------------------
                Sum of                                              
               Squares        DF    Mean Square      F         Sig. 
--------------------------------------------------------------------
Regression     930.999         2        465.500    69.211    0.0000 
Residual       195.048        29          6.726                     
Total         1126.047        31                                    
--------------------------------------------------------------------

                                  Parameter Estimates                                    
----------------------------------------------------------------------------------------
      model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
----------------------------------------------------------------------------------------
(Intercept)    37.227         1.599                 23.285    0.000    33.957    40.497 
         wt    -3.878         0.633       -0.630    -6.129    0.000    -5.172    -2.584 
         hp    -0.032         0.009       -0.361    -3.519    0.001    -0.050    -0.013 
----------------------------------------------------------------------------------------

7.3.8 Forward Stepwise, Schwartz Bayesian Criterion

olsrr::ols_step_forward_sbc(fit_mtcars_1)

                             Stepwise Summary                              
-------------------------------------------------------------------------
Step    Variable        AIC        SBC       SBIC        R2       Adj. R2 
-------------------------------------------------------------------------
 0      Base Model    208.756    211.687    114.990    0.00000    0.00000 
 1      wt            166.029    170.427     74.104    0.75283    0.74459 
 2      hp            156.652    162.515     66.271    0.82679    0.81484 
-------------------------------------------------------------------------

Final Model Output 
------------------

                         Model Summary                          
---------------------------------------------------------------
R                       0.909       RMSE                 2.469 
R-Squared               0.827       MSE                  6.095 
Adj. R-Squared          0.815       Coef. Var           12.909 
Pred R-Squared          0.781       AIC                156.652 
MAE                     1.901       SBC                162.515 
---------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 
 AIC: Akaike Information Criteria 
 SBC: Schwarz Bayesian Criteria 

                               ANOVA                                 
--------------------------------------------------------------------
                Sum of                                              
               Squares        DF    Mean Square      F         Sig. 
--------------------------------------------------------------------
Regression     930.999         2        465.500    69.211    0.0000 
Residual       195.048        29          6.726                     
Total         1126.047        31                                    
--------------------------------------------------------------------

                                  Parameter Estimates                                    
----------------------------------------------------------------------------------------
      model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
----------------------------------------------------------------------------------------
(Intercept)    37.227         1.599                 23.285    0.000    33.957    40.497 
         wt    -3.878         0.633       -0.630    -6.129    0.000    -5.172    -2.584 
         hp    -0.032         0.009       -0.361    -3.519    0.001    -0.050    -0.013 
----------------------------------------------------------------------------------------