5 D&H Ch5 - Dichotomous Regressors: “Weight”

Darlington & Hayes, Chapter 5’s first example

# install.packages("remotes")
# remotes::install_github("sarbearschwartz/apaSupp")
# remotes::install_github("ddsjoberg/gtsummary")
     
library(tidyverse) 
library(flextable)
library(apaSupp)
library(car)
library(rempsyc)
library(parameters)
library(performance)
library(interactions)
library(ggResidpanel)
library(olsrr)
flextable::set_flextable_defaults(digits = 2)

5.1 PURPOSE

RESEARCH QUESTION:

How is weight loss associationed with sex, after accounting for metabolism, exercise and food intake?

5.1.1 Data Description

Suppose you conducted a study examining the relationship between food consumption and weight loss among people enrolled (n = 10) in a month-long healthy living class.

5.1.1.1 Variables

Dependent Variable (DV)

  • loss average weight loss in hundreds of grams per week

Independent Variables (IVs)

  • exer average weekly hours of exercise

  • diet average daily food consumption (in 100s of calories about the recommended minimum of 1,000 calories required to maintain good health)

  • meta metabolic rate

  • sex dichotomous variable with 2 codes for Male and Female

Manually enter the data set provided on page 44 in Table 3.1

df_loss <- tibble::tribble(~id, ~exer, ~diet, ~meta, ~ sex, ~loss,
                           1, 0, 2, 15,  0,  6, 
                           2, 0, 4, 14,  0,  2,
                           3, 0, 6, 19,  0,  4,
                           4, 2, 2, 15,  1,  8,
                           5, 2, 4, 21,  1,  9,
                           6, 2, 6, 23,  0,  8,
                           7, 2, 8, 21,  1,  5,
                           8, 4, 4, 22,  1, 11,
                           9, 4, 6, 24,  0, 13,
                           10, 4, 8, 26,  0,  9) %>% 
  dplyr::mutate(sex = factor(sex,
                             levels = c(0, 1),
                             labels = c("Male", "Female")))

View the dataset

df_loss %>% 
  dplyr::select("ID" = id,
                "Exercise\nFrequency" = exer,
                "Food\nIntake" = diet,
                "Metabolic\nRate" = meta,
                "Sex" = sex,
                "Weight\nLoss" = loss) %>% 
  flextable::flextable() %>% 
  apaSupp::theme_apa(caption = "Dataset on Exercise, Food Intake, and Weight Loss",
                     general_note = "Darlington and Hayes textbook, data set provided on page 44 in Table 3.1.  Dependent variable is average weekly weight lost in 100s of grams.  Exercise captures daily average of hours.  Food intake is the average of 100's of calories above the recommendation.") %>% 
  flextable::colformat_double(digits = 0) 
``` ```{=html}
(\#tab:unnamed-chunk-187)Dataset on Exercise, Food Intake, and Weight Loss

ID

Exercise
Frequency

Food
Intake

Metabolic
Rate

Sex

Weight
Loss

1

0

2

15

Male

6

2

0

4

14

Male

2

3

0

6

19

Male

4

4

2

2

15

Female

8

5

2

4

21

Female

9

6

2

6

23

Male

8

7

2

8

21

Female

5

8

4

4

22

Female

11

9

4

6

24

Male

13

10

4

8

26

Male

9

Note. Darlington and Hayes textbook, data set provided on page 44 in Table 3.1. Dependent variable is average weekly weight lost in 100s of grams. Exercise captures daily average of hours. Food intake is the average of 100's of calories above the recommendation.

5.2 EXPLORATORY DATA ANALYSIS

5.2.1 Descriptive Statistics

5.2.1.1 Univariate

df_loss %>% 
  dplyr::select("Sex" = sex) %>% 
  apaSupp::tab_freq(caption = "Summary of Categorical Measures")
``` ```{=html}
(\#tab:unnamed-chunk-188)Summary of Categorical Measures

Statistic
(N=10)

Sex

Male

6 (60.0%)

Female

4 (40.0%)

df_loss %>% 
  dplyr::select("Exercise Frequency" = exer,
                "Food Intake" = diet,
                "Metabolic Rate" = meta,
                "Weight Loss" = loss) %>% 
  apaSupp::tab_desc(caption = "Summary for Continuous Measures") %>% 
  flextable::hline(i = 3)
``` ```{=html}
(\#tab:unnamed-chunk-189)Summary for Continuous Measures

Measure

NA

M

SD

min

Q1

Mdn

Q3

max

Exercise Frequency

0

2.00

1.63

0.00

0.50

2.00

3.50

4.00

Food Intake

0

5.00

2.16

2.00

4.00

5.00

6.00

8.00

Metabolic Rate

0

20.00

4.14

14.00

16.00

21.00

22.75

26.00

Weight Loss

0

7.50

3.31

2.00

5.25

8.00

9.00

13.00

Note. NA = not available or missing. Mdn = median. Q1 = 25th percentile, Q3 = 75th percentile. N = 10.

df_loss %>% 
  dplyr::select(sex,
                "Weight Loss" = loss,
                "Exercise Frequency" = exer,
                "Food Intake" = diet,
                "Metabolic Rate" = meta) %>% 
  apaSupp::table1_apa(caption = "Descriptive of Continuous Measures by Sex",
                      split = sex) %>% 
  flextable::bg(i = 1:3, j = 3, bg = "lightblue") %>% 
  flextable::bg(i = 1:3, j = 4, bg = "lightpink")
``` ```{=html}
(\#tab:unnamed-chunk-190)Descriptive of Continuous Measures by Sex

Total

Male

Female

p

n = 10

n = 6

n = 4

Weight Loss

.589

7.50 (3.31)

7.00 (3.90)

8.25 (2.50)

Exercise Frequency

.462

2.00 (1.63)

1.67 (1.97)

2.50 (1.00)

Food Intake

.581

5.00 (2.16)

5.33 (2.07)

4.50 (2.52)

Metabolic Rate

.887

20.00 (4.14)

20.17 (4.96)

19.75 (3.20)

Note. Continuous variables summarized by means (standard deviations) and are compared via independent ANOVA.

* p < .05. ** p < .01. *** p < .001.

5.2.1.2 Bivariate

Pearson’s correlation is only appropriate for TWO CONTINUOUS variables. The exception is when 1 variable is continuous and the other has exactly 2 levels. In this case, the binary variable needs to be converted to two numbers (numeric not factor) and the value is called the Point-Biserial Correlation (\(r_{pb}\)).

Table highlighting below:

  • YELLOW: un-adjusted correlation between each IV and the DV, at least some of them will usually be significant

  • ORANGE: pairwise correlations between each pair of IVs…if any are moderately strong, then we need to be concerned about potential multicolinearity and check the variance inflation factors (VIF).

df_loss %>% 
  dplyr::mutate(sex = as.numeric(sex == "Female")) %>% 
  dplyr::select("Weight Loss" = loss,
                "Exercise Frequency" = exer,
                "Food Intake" = diet,
                "Metabolic Rate" = meta,
                "Sex" = sex) %>% 
  apaSupp::tab_cor(caption = "Correlation Between Pairs of Measures",
                   general_note = "For pairs of variables with sex, r = Point-Biserial Correlation, otherwise ") %>% 
  flextable::hline(i = 4) %>% 
  flextable::bg(i = 1:4, bg = "yellow") %>% 
  flextable::bg(i = c(6, 8), bg = "orange") %>% 
  flextable::bold(i = c(4, 7, 9, 10))
``` ```{=html}
(\#tab:unnamed-chunk-193)Correlation Between Pairs of Measures

Variables

r

p

Weight Loss

Exercise Frequency

0.860

.001

**

Weight Loss

Food Intake

0.047

.898

Weight Loss

Metabolic Rate

0.650

.042

*

Weight Loss

Sex

0.200

.589

Exercise Frequency

Food Intake

0.380

.282

Exercise Frequency

Metabolic Rate

0.790

.007

**

Exercise Frequency

Sex

0.260

.462

Food Intake

Metabolic Rate

0.750

.013

*

Food Intake

Sex

-0.200

.581

Metabolic Rate

Sex

-0.052

.887

Note. For pairs of variables with sex, r = Point-Biserial Correlation, otherwise r = Pearson's Product-Moment correlation coefficient. N = 10.

* p < .05. ** p < .01. *** p < .001.

5.2.2 Visualizing Distributions

5.2.2.1 Univariate

df_loss %>% 
  dplyr::mutate(sex = as.numeric(sex)) %>% 
  dplyr::select(id, 
                "Sex" = sex,
                "Weight Loss\n(100 g/day)" = loss,
                "Exercise Frequency\n(hr/day)" = exer,
                "Food Intake\n(100 cal/day above 1000)" = diet,
                "Metabolic Rate" = meta) %>% 
  tidyr::pivot_longer(cols = -id) %>% 
  ggplot(aes(value)) +
  geom_histogram(binwidth = 1,
                 color = "black",
                 alpha = .25) +
  theme_bw() +
  facet_wrap(~ name,
             scale = "free_x") +
  labs(x = NULL,
       y = "Count")

Figure 5.1
Univariate Distibution of Measures

Univariate Distibution of Measures

5.2.2.2 Bivariate

df_loss %>% 
  dplyr::mutate(sex = as.numeric(sex == "Female")) %>% 
  ggplot(aes(x = sex,
             y = loss)) +
  geom_point(size = 3)  +
  geom_smooth(method = "lm",
              formula = y ~ x) +
  geom_hline(yintercept = mean(df_loss$loss),
             linetype = "longdash") +
  stat_summary(geom = "point",
               fun = "mean",
               color = "red",
               fill = "red",
               alpha = .2,
               shape = 23,
               size = 9) +
  theme_bw() +
  scale_x_continuous(breaks = 0:1) +
  labs(x = "Sex, recoded as 0 = Male and 1 = Female",
       y = "Observed Weight Loss, 100s of grams/week") 

Figure 5.2
Scatterplot for Weight Loss Regressed on Sex

Scatterplot for Weight Loss Regressed on Sex
df_loss %>% 
  ggplot(aes(x = sex,
             y = loss)) +
  geom_boxplot(fill = "gray") +
  stat_summary(geom = "point",
               fun = "mean",
               color = "red",
               shape = 18,
               size = 9) +
  theme_bw() +
  labs(x = NULL,
       y = "Observed Weight Loss\nWeekly average of 100s of grams/week") 

Figure 5.3
Boxplot for Weight Loss Distribution by Sex

Boxplot for Weight Loss Distribution by Sex

5.3 GROUP MEAN DIFFERENCE

5.3.1 Assess Homogeneity of Variance

When conducting an independent group mean difference t-test …

IF: * evidence that HOV is violated (Levene’s p < .05)

THEN: * use Welch’s adjustment to degrees of freedom and perform a seperate variance t-test instead of a standard pool-variance t-test.

car::leveneTest(loss ~ sex,
                data = df_loss,
                center = "mean")
# A tibble: 2 × 3
     Df `F value` `Pr(>F)`
  <int>     <dbl>    <dbl>
1     1      1.05    0.335
2     8     NA      NA    

The un-adjusted MEAN DIFFERENCE in weight loss by gender may be tested with a t-test

t.test(loss ~ sex,
       data = df_loss,
       var.equal = TRUE)

    Two Sample t-test

data:  loss by sex
t = -0.56269, df = 8, p-value = 0.5891
alternative hypothesis: true difference in means between group Male and group Female is not equal to 0
95 percent confidence interval:
 -6.372702  3.872702
sample estimates:
  mean in group Male mean in group Female 
                7.00                 8.25 
cor.test(~ loss + as.numeric(sex), data = df_loss)

    Pearson's product-moment correlation

data:  loss and as.numeric(sex)
t = 0.56269, df = 8, p-value = 0.5891
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.4953645  0.7345089
sample estimates:
      cor 
0.1951181 
r <- 0.1951181 

r^2
[1] 0.03807107

5.4 REGRESSION ANALYSIS

This data was previously analyzed by regressing weight loss on the main effects of the continuous independent measures (See D & H Ch2 a).

5.4.1 Binary Predictor (IV)

5.4.1.1 Unadjusted Model

fit_loss_sex <- lm(loss ~ sex,
                    data = df_loss)

summary(fit_loss_sex)

Call:
lm(formula = loss ~ sex, data = df_loss)

Residuals:
   Min     1Q Median     3Q    Max 
 -5.00  -2.50   0.25   1.75   6.00 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)    7.000      1.405   4.982  0.00108 **
sexFemale      1.250      2.221   0.563  0.58906   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.441 on 8 degrees of freedom
Multiple R-squared:  0.03807,   Adjusted R-squared:  -0.08217 
F-statistic: 0.3166 on 1 and 8 DF,  p-value: 0.5891

5.4.1.2 Adjusted Model

fit_loss_wsex <- lm(loss ~ exer + diet + meta + sex,
                    data = df_loss)

summary(fit_loss_wsex)

Call:
lm(formula = loss ~ exer + diet + meta + sex, data = df_loss)

Residuals:
        1         2         3         4         5         6         7         8 
 0.238856 -0.894870  0.373423  0.340622  0.009251 -0.327166  0.542473 -0.892347 
        9        10 
 1.771236 -1.161480 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  -0.9672     3.4552  -0.280   0.7907  
exer          1.1510     0.5066   2.272   0.0723 .
diet         -1.1333     0.3156  -3.591   0.0157 *
meta          0.5997     0.2608   2.299   0.0699 .
sexFemale    -0.4037     0.8687  -0.465   0.6617  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.166 on 5 degrees of freedom
Multiple R-squared:  0.931, Adjusted R-squared:  0.8758 
F-statistic: 16.86 on 4 and 5 DF,  p-value: 0.004163
apaSupp::tab_lms(list(fit_loss_sex, fit_loss_wsex),
                 mod_names = c("Unadjusted", "Adjusted"),
                var_labels = c("exer" = "Exercise Freq",
                               "diet" = "Food Intake",
                               "meta" = "Metabolic Rate",
                               "sex"  = "Sex"),
                caption = "Compare Unadjusted and Adjusted Models") %>% 
  flextable::bg(i = 2:4, bg = "yellow")
``` ```{=html}
(\#tab:unnamed-chunk-205)Compare Unadjusted and Adjusted Models

Unadjusted

Adjusted

Variable

b

(SE)

p

b

(SE)

p

(Intercept)

7.00

(1.40)

.001 **

-0.97

(3.46)

.791

Sex

Male

Female

1.25

(2.22)

.589

-0.40

(0.87)

.662

Exercise Freq

1.15

(0.51)

.072

Food Intake

-1.13

(0.32)

.016 *

Metabolic Rate

0.60

(0.26)

.070

0.04

0.93

Adjusted R²

-0.08

0.88

* p < .05. ** p < .01. *** p < .001.

apaSupp::tab_lm(fit_loss_wsex, 
                fit = c("r.squared", "adj.r.squared",
                        "statistic", "df", "df.residual", "p.value"),
                var_labels = c("exer" = "Exercise Freq",
                               "diet" = "Food Intake",
                               "meta" = "Metabolic Rate",
                               "sex"  = "Sex"),
                caption = "Parameter Estimtates for Weight Loss Regressed Exercise Frequency, Food Intake, and Sex",
                d = 3) %>% 
  flextable::bg(i = 5:7, bg = "yellow")
``` ```{=html}
(\#tab:unnamed-chunk-206)Parameter Estimtates for Weight Loss Regressed Exercise Frequency, Food Intake, and Sex

Variable

b

(SE)

p

b*

𝜂²

𝜂ₚ²

(Intercept)

-0.967

(3.455)

.791

Exercise Freq

1.151

(0.507)

.072

0.568

.071

.508

Food Intake

-1.133

(0.316)

.016 *

-0.740

.178

.721

Metabolic Rate

0.600

(0.261)

.070

0.750

.073

.514

Sex

.003

.041

Male

Female

-0.404

(0.869)

.662

0.931

Adjusted R²

0.876

Statistic

16.864

df

4.000

Residual df

5.000

p-value

0.004

Note. b* = standardized estimate. 𝜂² = semi-partial correlation. 𝜂ₚ² = partial correlation.

* p < .05. ** p < .01. *** p < .001.

Estimated Regression Coefficients show RELATIONSHIPS

  • Continuous IV: slope, rise per run
  • Categorical IV: difference in group means

INTERPRETATION:

After adjusting or correcting for difference in exercise frequency, food intake, and metobalic rate, females lost nearly 404 g/wk less on average than their male counterparts although this was not a statistically significant difference, b = -0.40, SE = 0.87, p = .662.

5.4.2 Model to Visualize

5.4.2.1 Fit Model

fit_loss_plot <- lm(loss ~ exer + diet + sex,
                    data = df_loss)

summary(fit_loss_plot)

Call:
lm(formula = loss ~ exer + diet + sex, data = df_loss)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.2353 -0.7101  0.2269  0.8698  1.5210 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)   6.5714     1.4264   4.607  0.00366 **
exer          2.1261     0.3628   5.860  0.00109 **
diet         -0.5840     0.2700  -2.163  0.07376 . 
sexFemale    -1.0084     1.0840  -0.930  0.38813   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.527 on 6 degrees of freedom
Multiple R-squared:  0.858, Adjusted R-squared:  0.7871 
F-statistic: 12.09 on 3 and 6 DF,  p-value: 0.005915

5.4.2.2 Equations

General Equation:

\[ \widehat{loss} = 6.571 + 2.127(exer) - 0.584(diet) - 1.008(sex) \]

Equation for MALES:

\[ \widehat{loss} = 6.571 + 2.127(exer) - 0.584(diet) - 1.008(0) \\ \widehat{loss} = 6.571 + 2.127(exer) - 0.584(diet) - 0\\ \widehat{loss} = 6.571 + 2.127(exer) - 0.584(diet) \]

Equation for FEAMLES:

\[ \widehat{loss} = 6.571 + 2.127(exer) - 0.584(diet) - 1.008(1) \\ \widehat{loss} = 6.571 + 2.127(exer) - 0.584(diet) - 1.008 \\ \widehat{loss} = 5.563 + 2.127(exer) - 0.584(diet) \]

5.4.2.3 3D Plot

Figure 5.4
Darlington & Hayes: Figure 5.2, page 130

Darlington & Hayes: Figure 5.2, page 130

5.4.2.4 2D Plots

There is always more than one way to arrange a plot when there are TWO OR MORE predictors (IVs).

interactions::interact_plot(model = fit_loss_plot,
                            pred = exer,
                            modx = sex,
                            mod2 = diet,
                            mod2.values = c(2, 8),
                            mod2.labels = c("1200 Calories/day",
                                            "1800 Calories/day"),
                            interval = TRUE,
                            legend.main = "Sex",
                            x.label = "Exercise, hr/wk",
                            y.label = "Estimated Marginal Mean\nWeight Loss, 100g/wk") + 
  theme_bw() +
  theme(legend.position = "inside",
        legend.position.inside = c(0, 1),
        legend.justification = c(-0.1, 1.1),
        legend.background = element_rect(color = "black"),
        legend.key.width = unit(1.5, "cm"))

Figure 5.5
Option A

Option A
interactions::interact_plot(model = fit_loss_plot,
                            pred = diet,
                            modx = sex,
                            mod2 = exer,
                            mod2.values = c(0, 2, 4),
                            mod2.labels = c("Exercise = 0 hr/wk",
                                            "Exercise = 2 hr/wk",
                                            "Exercise = 4 hr/wk"),
                            interval = TRUE,
                            legend.main = "Sex",
                            x.label = "Food Intake, cal/day",
                            y.label = "Estimated Marginal Mean\nWeight Loss, 100g/wk") + 
  theme_bw() +
  scale_x_continuous(breaks = c(2, 4, 6, 8),
                     labels = c(1200, 1400, 1600, 1800)) +
  theme(legend.position = "inside",
        legend.position.inside = c(0, 1),
        legend.justification = c(-0.1, 1.1),
        legend.background = element_rect(color = "black"),
        legend.key.width = unit(1.5, "cm")) 

Figure 5.6
Option B

Option B
interactions::interact_plot(model = fit_loss_plot,
                            pred = diet,
                            modx = exer,
                            modx.values = c(0, 2, 4),
                            modx.labels = c("0 hr/wk",
                                            "2 hr/wk",
                                            "4 hr/wk"),
                            mod2 = sex,
                            mod2.labels = c("Male", "Female"),
                            interval = TRUE,
                            legend.main = "Exercise",
                            x.label = "Food Intake, cal/day",
                            y.label = "Estimated Marginal Mean\nWeight Loss, 100g/wk") + 
  theme_bw() +
  scale_x_continuous(breaks = c(2, 4, 6, 8),
                     labels = c(1200, 1400, 1600, 1800)) +
  theme(legend.position = "inside",
        legend.position.inside = c(1, 1),
        legend.justification = c(1.1, 1.1),
        legend.background = element_rect(color = "black"),
        legend.key.width = unit(1.5, "cm")) 

Figure 5.7
Option C

Option C

5.5 MULTIDIMENSIONAL SETS

5.5.1 Fit Models

Divide the predictors (IVs) into two sets as folows:

  • “Set A” (meta and sex) demographic or biological,hard to change

  • “Set B” (exer and diet) modifiable behavioral, can be targeted

fit_loss_none <- lm(loss ~ 1,                        data = df_loss)
fit_loss_setA <- lm(loss ~ meta + sex,               data = df_loss)
fit_loss_setB <- lm(loss ~ exer + diet,              data = df_loss)
fit_loss_all4 <- lm(loss ~ meta + sex + exer + diet, data = df_loss)

5.5.2 Table of Parameter Estimates

apaSupp::tab_lms(list(fit_loss_setA, fit_loss_setB, fit_loss_all4),
                 mod_names = c("Set A",
                               "Set B",
                               "A & B"),
                 var_labels = c("exer" = "Exercise Freq",
                                "diet" = "Food Intake",
                                "meta" = "Metabolic Rate",
                                "sex"  = "Sex"),
                 caption = "Compare Model Parameter Estimates") %>% 
  flextable::bg(i = 2:5, bg = "lightgreen") %>% 
  flextable::bg(i = 6:7, bg = "violet")
``` ```{=html}
(\#tab:unnamed-chunk-215)Compare Model Parameter Estimates

Set A

Set B

A & B

Variable

b

(SE)

p

b

(SE)

p

b

(SE)

p

(Intercept)

-3.67

(4.56)

.448

6.00

(1.27)

.002 **

-0.97

(3.46)

.791

Metabolic Rate

0.53

(0.22)

.047 *

0.60

(0.26)

.070

Sex

Male

Female

1.47

(1.76)

.430

-0.40

(0.87)

.662

Exercise Freq

2.00

(0.33)

< .001 ***

1.15

(0.51)

.072

Food Intake

-0.50

(0.25)

.088

-1.13

(0.32)

.016 *

0.47

0.84

0.93

Adjusted R²

0.32

0.79

0.88

* p < .05. ** p < .01. *** p < .001.

5.5.3 Compare Fit Statistics

performance::compare_performance(fit_loss_none,
                                 fit_loss_setA, 
                                 fit_loss_setB, 
                                 fit_loss_all4) %>% 
  data.frame() %>% 
  dplyr::mutate(Model = c("None", "Set A", "Set B", "A & B")) %>% 
  dplyr::select(Model, AIC, BIC, R2, R2_adjusted, RMSE, Sigma) %>% 
  flextable::flextable() %>% 
  apaSupp::theme_apa(caption = "Compare Models Fit Measures")
``` ```{=html}
(\#tab:unnamed-chunk-216)Compare Models Fit Measures

Model

AIC

BIC

R2

R2_adjusted

RMSE

Sigma

None

55.25

55.86

0.00

0.00

3.14

3.31

Set A

52.82

54.03

0.47

0.32

2.28

2.72

Set B

41.08

42.29

0.84

0.79

1.26

1.51

A & B

36.52

38.33

0.93

0.88

0.82

1.17

5.5.4 Multicolinearity Checks

Refer back to the pairwise correlations in the exploratory data anlaysis section above

car::vif(fit_loss_setA)
    meta      sex 
1.002713 1.002713 
car::vif(fit_loss_setB)
    exer     diet 
1.166667 1.166667 
car::vif(fit_loss_all4)
    meta      sex     exer     diet 
7.706476 1.332212 4.531259 3.077096 

Most multivariate statistical approaches involve decomposing a correlation matrix into linear combinations of variables. The linear combinations are chosen so that the first combination has the largest possible variance (subject to some restrictions we won’t discuss), the second combination has the next largest variance, subject to being uncorrelated with the first, the third has the largest possible variance, subject to being uncorrelated with the first and second, and so forth. The variance of each of these linear combinations is called an eigenvalue. Collinearity is spotted by finding 2 or more variables that have large proportions of variance (.50 or more) that correspond to large condition indices. A rule of thumb is to label as large those condition indices in the range of 30 or larger.

olsrr::ols_coll_diag(fit_loss_all4)
Tolerance and Variance Inflation Factor
---------------------------------------
  Variables Tolerance      VIF
1      meta 0.1297610 7.706476
2 sexFemale 0.7506313 1.332212
3      exer 0.2206892 4.531259
4      diet 0.3249818 3.077096


Eigenvalue and Condition Index
------------------------------
   Eigenvalue Condition Index    intercept         meta  sexFemale         exer
1 4.135512700        1.000000 0.0006160027 0.0002724501 0.01387363 3.978915e-03
2 0.565206086        2.704963 0.0004350066 0.0003493460 0.65741826 6.450569e-07
3 0.233631940        4.207252 0.0088906847 0.0003131313 0.04430496 2.676271e-01
4 0.062413815        8.139998 0.0559920430 0.0030556071 0.17162000 2.400058e-02
5 0.003235459       35.751703 0.9340662631 0.9960094655 0.11278315 7.043928e-01
         diet
1 0.002439753
2 0.009289727
3 0.004776515
4 0.458034251
5 0.525459754

5.5.5 Residual Diagnostics

Interpretation:

rempsyc::nice_assumption()

p-values < .05 imply assumptions are not respected. Diagnostic is how many assumptions are not respected for a given model or variable.

rempsyc::nice_assumptions(fit_loss_all4) %>% 
  unlist()
                                       Model 
           "loss ~ meta + sex + exer + diet" 
                    Normality (Shapiro-Wilk) 
                                     "0.475" 
            Homoscedasticity (Breusch-Pagan) 
                                     "0.122" 
Autocorrelation of residuals (Durbin-Watson) 
                                     "0.987" 
                               Diagnostic.BP 
                                         "0" 
performance::check_residuals(fit_loss_all4)
OK: Simulated residuals appear as uniformly distributed (p = 0.728).
ggResidpanel::resid_panel(fit_loss_all4)

5.6 EFFECT of VARIABLE SET

Unique Contribution: amount \(R^2\) would drop if the regressor(s) were removed from the analysis.

ss_lm <- function(model){
  ss_tot = data.frame(anova(model))$Sum.Sq[model$rank]
  ss_reg = sum(data.frame(anova(model))$Sum.Sq[-model$rank])
  ss_err = deviance(model)
  r_sqr  = summary(model)$r.squared
  r_adj  = summary(model)$adj.r.squared
  return(list(R2_Unadjusted = r_sqr,
              R2_Adjusted   = r_adj,
              SS_Total      = ss_tot, 
              SS_Regression = ss_reg, 
              SS_Error      = ss_err))
}
data.frame(Variables = c("None", "Set A", "Set B", "A & B")) %>% 
  dplyr::mutate(model = list(fit_loss_none,
                             fit_loss_setA,
                             fit_loss_setB,
                             fit_loss_all4)) %>% 
  dplyr::mutate(values = purrr::map_df(model, ss_lm)) %>% 
  dplyr::select(-model) %>% 
  tidyr::unnest(cols = values) %>% 
  flextable::flextable() %>% 
  flextable::separate_header() %>% 
  apaSupp::theme_apa(caption = "Contribution of Variables Sets A and B") %>% 
  flextable::border_inner(part = "header", border = officer::fp_border(width = 0)) %>% 
  flextable::align(part = "body", j = c(2, 4, 5), align = "right")%>% 
  flextable::align(part = "body", j = c(3, 6), align = "left")
``` ```{=html}
(\#tab:unnamed-chunk-226)Contribution of Variables Sets A and B

Variables

R2

SS

Unadjusted

Adjusted

Total

Regression

Error

None

0.00

0.00

98.50

0.00

98.50

Set A

0.47

0.32

51.77

46.73

51.77

Set B

0.84

0.79

16.00

82.50

16.00

A & B

0.93

0.88

6.80

91.70

6.80

5.6.0.1 Set B’s Unique Contriburion

Semipartial Multiple Correlation, \(SR(B \vert A)\)

  • UNIQUE contribution of set B, relative to ALL the variance in the Dependent Variable (DV)

  • Correlation between the Dependent Variable (DV) AND set B, with all the the variables in set A held constant.

  • How much \(R^2\) increase when the variables in set B are added to a model of Y already including set A variables as regressors.

\[ SR(B \vert A)^2 = R(AB)^2 - R(A)^2 \\ \]

summary(fit_loss_all4)$r.squared - summary(fit_loss_setA)$r.squared
[1] 0.4565391

Partial multiple correlation, \(PR(B \vert A)\)

  • Proportion of remaining variables explained by set B

  • UNIQUE contribution of set B, relative to the REMAINING variance in the Dependent Variable (DV) unaccounted for by set A

\[ PR(B \vert A)^2 = \frac{SR(B \vert A)^2}{1 - R(A)^2} \]

(summary(fit_loss_all4)$r.squared - summary(fit_loss_setA)$r.squared) /
  (1 - summary(fit_loss_setA)$r.squared)
[1] 0.8686927

INTERPRETATION:

Of these four factors, the onse most under a person’s control (food intake and exercise) uniquely explain about 45.7% of the varaince in weight loss, holding constant the ones less under personal control (sex and metabolism).

We can also say the factors more under personal control explain about 86.9% of the variance in weight loss that remains after accounting for the factors less under control.

5.6.1 Significance of Set B’s Contribution

5.6.2 Significance of Set B’s Contribution

anova(fit_loss_none, fit_loss_setA, fit_loss_all4)
# A tibble: 3 × 6
  Res.Df   RSS    Df `Sum of Sq`     F `Pr(>F)`
   <dbl> <dbl> <dbl>       <dbl> <dbl>    <dbl>
1      9 98.5     NA        NA    NA   NA      
2      7 51.8      2        46.7  17.2  0.00575
3      5  6.80     2        45.0  16.5  0.00625

INTERPRETATION:

After controlling for biological factors of metabolism and sex, F(2, 7) = 17.19, p = .005, the modifiable factors of exercise and food intake significantly impart weight loss, F(2, 5) = 16.54, p = .006.

5.6.3 Standardized Regression Coefficients

Standardized regression coefficients for CATEGORICAL predictors (IVs) is DISCOURAGED!

parameters::standardise_parameters(fit_loss_setB)
# A tibble: 3 × 5
  Parameter   Std_Coefficient    CI CI_low CI_high
  <chr>                 <dbl> <dbl>  <dbl>   <dbl>
1 (Intercept)        8.24e-17  0.95 -0.342  0.342 
2 exer               9.87e- 1  0.95  0.598  1.38  
3 diet              -3.26e- 1  0.95 -0.716  0.0626
parameters::standardise_parameters(fit_loss_all4)
# A tibble: 5 × 5
  Parameter   Std_Coefficient    CI  CI_low CI_high
  <chr>                 <dbl> <dbl>   <dbl>   <dbl>
1 (Intercept)          0.0488  0.95 -0.345    0.442
2 meta                 0.750   0.95 -0.0885   1.59 
3 sexFemale           -0.122   0.95 -0.797    0.553
4 exer                 0.568   0.95 -0.0747   1.21 
5 diet                -0.740   0.95 -1.27    -0.210