11 Ex: Ventricular Shortening Velocity

# install.packages("remotes")
# remotes::install_github("sarbearschwartz/apaSupp")
# remotes::install_github("ddsjoberg/gtsummary")

library(magrittr)       
library(tidyverse)   
library(broom)     
library(naniar)
library(corrplot)   
library(GGally)
library(gtsummary)
library(apaSupp)
library(performance)
library(interactions)
library(effects)
library(emmeans)
library(car)
library(ggResidpanel)
library(modelsummary)
library(ppcor)
library(jtools)
library(olsrr)
library(DescTools)
library(effectsize)
library(ggpubr)

library(ISwR)            # Introduction to Statistics with R (datasets)

11.1 PURPOSE

11.1.1 Research Question

Is there a relationship between fasting blood glucose and shortening of ventricular velocity among type 1 diabetic patiences? If so, what is the nature of the association?

11.1.2 Data Description

This dataset is included in the ISwR package (Dalgaard 2024), which was a companion to the texbook “Introductory Statistics with R, 2nd ed.” (Dalgaard 2008), although it was first published by Altman (1991) in Table 11.6.

The thuesen data frame has 24 rows and 2 columns. It contains ventricular shortening velocity and blood glucose for type 1 diabetic patients.

  • blood.glucose a numeric vector, fasting blood glucose (mmol/l).
  • short.velocity a numeric vector, mean circumferential shortening velocity (%/s).
data(thuesen, package = "ISwR")
tibble::glimpse(thuesen)  # view the class and 1st few values of each variable
Rows: 24
Columns: 2
$ blood.glucose  <dbl> 15.3, 10.8, 8.1, 19.5, 7.2, 5.3, 9.3, 11.1, 7.5, 12.2, …
$ short.velocity <dbl> 1.76, 1.34, 1.27, 1.47, 1.27, 1.49, 1.31, 1.09, 1.18, 1…

11.2 EXPLORATORY DATA ANALYSIS

Before embarking on any inferencial anlaysis or modeling, always get familiar with your variables one at a time (univariate), as well as pairwise (bivariate).

11.2.1 Summary Statistics

11.2.1.1 Univariate

Summary Statistics for all three variables of interest

Center: mean and median Spread: standard deviation, range (max - min), interquartile range (Q3 - Q1)

thuesen %>% 
  dplyr::select("Fasting Blood Glucose" = blood.glucose, 
                "Circumferential Shortening Velocity" = short.velocity) %>% 
  apaSupp::tab_desc(caption = "Description of Diabetic Participants")
Table 11.1
Description of Diabetic Participants

Variable

NA

M

SD

min

Q1

Mdn

Q3

max

Fasting Blood Glucose

0

10.30

4.34

4.20

7.08

9.40

12.70

19.50

Circumferential Shortening Velocity

1

1.33

0.23

1.03

1.18

1.27

1.42

1.95

Note. NA = not available or missing. Mdn = median. Q1 = 25th percentile, Q3 = 75th percentile. N = 24.

11.2.1.2 Bivariate

(Unadjusted Pearson’s correlation)

The cor() fucntion in base \(R\) doesn’t like NA or missing values

thuesen %>% cor()           
               blood.glucose short.velocity
blood.glucose              1             NA
short.velocity            NA              1

You may specify how to handle cases that are missing on at least one of the variables of interest:

  • use = “everything” NAs will propagate conceptually, i.e., a resulting value will be NA whenever one of its contributing observations is NA <– DEFAULT
  • use = “all.obs” the presence of missing observations will produce an error
  • use = “complete.obs” missing values are handled by casewise deletion (and if there are no complete cases, that gives an error).
  • use = “na.or.complete” is the same as above unless there are no complete cases, that gives NA
  • use = “pairwise.complete.obs” the correlation between each pair of variables is computed using all complete pairs of observations on those variables. This can result in covariance matrices which are not positive semi-definite, as well as NA entries if there are no complete pairs for that pair of variables.

Commonly, we want listwise deletion:

thuesen %>% cor(use = "complete.obs")   # list-wise deletion
               blood.glucose short.velocity
blood.glucose      1.0000000      0.4167546
short.velocity     0.4167546      1.0000000

It is also handy to specify the number of decimal places desired, but adding a rounding step:

thuesen %>% 
  cor(use = "complete.obs") %>%   
  round(2)                       # number od decimal places
               blood.glucose short.velocity
blood.glucose           1.00           0.42
short.velocity          0.42           1.00

If you desire a correlation single value of a single PAIR of variables, instead of a matrix, then you must use a magrittr exposition pipe (%$%)

thuesen %$%                            # notice the special kind of pipe
  cor(blood.glucose, short.velocity,   # specify exactly TWO variables            
      use = "complete.obs")
[1] 0.4167546

In addition to the cor() funciton, the base \(R\) stats package also includes the cor.test() function to test if the correlation is zero (\(H_0: R = 0\))

This TESTS if the cor == 0

thuesen %$%                                 # notice the special kind of pipe
  cor.test(blood.glucose, short.velocity,   # specify exactly TWO variables            
           use="complete.obs")

    Pearson's product-moment correlation

data:  blood.glucose and short.velocity
t = 2.101, df = 21, p-value = 0.0479
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.005496682 0.707429479
sample estimates:
      cor 
0.4167546 

The default correltaion type for cor()is Pearson’s \(R\), which assesses linear relationships. Spearman’s correlation assesses monotonic relationships.

thuesen %$%                            # notice the special kind of pipe
  cor(blood.glucose, short.velocity,   # specify exactly TWO variables  
      use    = 'complete',
      method = 'spearman')       # spearman's (rho) 
[1] 0.318002

11.2.2 Visualize Distributions

11.2.2.1 Univariate

thuesen %>% 
  apaSupp::spicy_histos(var = blood.glucose)

thuesen %>% 
  apaSupp::spicy_histos(var = short.velocity)

11.2.2.2 Bivariate

ggplot(thuesen, 
       aes(x = blood.glucose,        # x-axis variable
           y = short.velocity)) +    # y-axis variable
  geom_point() +                     # place a point for each observation
  geom_smooth(method = "lm",
              formula = y ~ x) +
  ggpubr::stat_regline_equation(label.x = 5,
                                label.y = 1.85,
                                size = 6) +
  theme_bw() +
  labs(x = "Blood Glucose, mmol/l",
       y = "Mean Circumferential Shortening Velocity, %/s")

11.3 REGRESSION ANALYSIS

  • short.velocity dependent variable or outcome (\(Y\))
  • blood.glucose independent variable or predictor (\(X\))

11.3.1 Fit Model

\[ Y = \beta_0 + \beta_1 \times X \]

The lm() function must be supplied with at least two options:

  • a formula: Y ~ X
  • a dataset: data = XXXXXXX

When a model is fit and directly saved as a named object via the assignment opperator (<-), no output is produced.

fit_vel_glu <- lm(short.velocity ~ blood.glucose, 
                  data = thuesen)

Running the name of the fit object yields very little output:

fit_vel_glu

Call:
lm(formula = short.velocity ~ blood.glucose, data = thuesen)

Coefficients:
  (Intercept)  blood.glucose  
      1.09781        0.02196  

Appling the summary() function produced a good deal more output:

summary(fit_vel_glu)

Call:
lm(formula = short.velocity ~ blood.glucose, data = thuesen)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.40141 -0.14760 -0.02202  0.03001  0.43490 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)    1.09781    0.11748   9.345 6.26e-09 ***
blood.glucose  0.02196    0.01045   2.101   0.0479 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2167 on 21 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.1737,    Adjusted R-squared:  0.1343 
F-statistic: 4.414 on 1 and 21 DF,  p-value: 0.0479

You may request specific pieces of the output:

  • Coefficients or beta estimates:
coef(fit_vel_glu)
  (Intercept) blood.glucose 
   1.09781488    0.02196252 
  • 95% confidence intervals for the coefficients or beta estimates:
confint(fit_vel_glu)
                     2.5 %     97.5 %
(Intercept)   0.8534993816 1.34213037
blood.glucose 0.0002231077 0.04370194
  • The F-test for overall modle fit vs. a \(null\) or empty model having only an intercept and no predictors.
anova(fit_vel_glu)
# A tibble: 2 × 5
     Df `Sum Sq` `Mean Sq` `F value` `Pr(>F)`
  <int>    <dbl>     <dbl>     <dbl>    <dbl>
1     1    0.207    0.207       4.41   0.0479
2    21    0.986    0.0470     NA     NA     
  • Various other model fit indicies:
logLik(fit_vel_glu)     
'log Lik.' 3.583612 (df=3)
AIC(fit_vel_glu)
[1] -1.167223
BIC(fit_vel_glu)
[1] 2.239259

11.3.2 Assumption Checks

11.3.2.1 Residual Diagnostics

Before reporting a model, ALWAYS make sure to check the residues to ensure that the model assumptions are not violated.

  • Residuals normally distributed
  • Residuals have homogeneity of variance
  • Check for highly influential points
plot(fit_vel_glu, which = 1)

plot(fit_vel_glu, which = 2)

plot(fit_vel_glu, which = 5)

plot(fit_vel_glu, which = 6)

performance::check_residuals(fit_vel_glu)
OK: Simulated residuals appear as uniformly distributed (p = 0.259).
ggResidpanel::resid_panel(fit_vel_glu)

Viewing potentially influential or outlier points based on plots above:

thuesen %>% 
  dplyr::mutate(id = row_number()) %>% 
  dplyr::filter(id == c(13, 20, 24))
# A tibble: 3 × 3
  blood.glucose short.velocity    id
          <dbl>          <dbl> <int>
1          19             1.95    13
2          16.1           1.05    20
3           9.5           1.7     24

Here is a fancy way to visulaize ‘potential problem cases’ with ggplot2:

thuesen %>% 
  dplyr::filter(complete.cases(.)) %>%                # keep only complete cases
  ggplot() +                                          # name the FULL dataset 
  aes(x = blood.glucose,                              # x-axis variable name
      y = short.velocity) +                           # y-axis variable name
  geom_point() +                                      # do a scatterplot
  stat_smooth(method = "lm") +                        # smooth: linear model
  theme_bw()  +                                       # black-and-while theme
  geom_point(data = thuesen %>%                       # override the dataset from above
               filter(row_number() == c(13, 20, 24)), # with a reduced subset of cases
             size = 4,                                # make the points bigger in size 
             color = "red")                           # give the points a different color

11.3.3 Visualize Relationships

When a model only contains main effects, a plot is not important for interpretation, but can help understand the relationship between multiple predictors.

The Effect() function from the effects package chooses ‘5 or 6 nice values’ for your continuous independent variable (\(X\)) based on the range of values found in the dataset on which the model was fit and plugs them into the regression equation \(Y = \beta_0 + \beta_1 \times X\) to compute the predicted mean value of the outcome (\(Y\)) (Fox et al. 2022).

effects::Effect(focal.predictors = c("blood.glucose"),  # IV variable name
                mod = fit_vel_glu)                      # fitted model name

 blood.glucose effect
blood.glucose
     4.2        8       12       16       20 
1.190057 1.273515 1.361365 1.449215 1.537065 

You may override the ‘nice values’ using the xlevels = list(var_name = c(#, #, ...#) option.

effects::Effect(focal.predictors = c("blood.glucose"),
                mod = fit_vel_glu,
                xlevels = list(blood.glucose = c(5, 10, 15, 20))) 

 blood.glucose effect
blood.glucose
       5       10       15       20 
1.207627 1.317440 1.427253 1.537065 

Adding a piped data frame step (%>% data.frame()) will arrange the predicted \(Y\) values into a column called fit. This tidy data format is ready for plotting.

effects::Effect(focal.predictors = c("blood.glucose"),
                mod = fit_vel_glu) %>% 
  data.frame() 
# A tibble: 5 × 5
  blood.glucose   fit     se lower upper
          <dbl> <dbl>  <dbl> <dbl> <dbl>
1           4.2  1.19 0.0788  1.03  1.35
2           8    1.27 0.0516  1.17  1.38
3          12    1.36 0.0483  1.26  1.46
4          16    1.45 0.0742  1.29  1.60
5          20    1.54 0.110   1.31  1.77
effects::Effect(focal.predictors = c("blood.glucose"),
                mod = fit_vel_glu,
                xlevels = list(blood.glucose = c(5, 12, 20))) %>% 
  data.frame() %>% 
  ggplot() +
  aes(x = blood.glucose,           # x-axis variable
      y = fit) +                   # y-axis variable
  geom_ribbon(aes(ymin = lower,    # bottom edge of the ribbon
                  ymax = upper),   # top edge of the ribbon
              alpha = .5) +        # ribbon transparency level
  geom_line() +
  theme_bw()

Notice that although the regression line is smooth, the ribbon is choppy. This is because we are basing it on only THREE values of \(X\).

c(5, 12, 20)
[1]  5 12 20

Use the seq() function in base \(R\) to request many values of \(X\)

seq(from = 5, to = 20, by = 5)
[1]  5 10 15 20
seq(from = 5, to = 20, by = 2)
[1]  5  7  9 11 13 15 17 19
seq(from = 5, to = 20, by = 1)
 [1]  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
seq(from = 5, to = 20, by = .5)
 [1]  5.0  5.5  6.0  6.5  7.0  7.5  8.0  8.5  9.0  9.5 10.0 10.5 11.0 11.5 12.0
[16] 12.5 13.0 13.5 14.0 14.5 15.0 15.5 16.0 16.5 17.0 17.5 18.0 18.5 19.0 19.5
[31] 20.0
effects::Effect(focal.predictors = c("blood.glucose"),
                mod = fit_vel_glu,
                xlevels = list(blood.glucose = seq(from = 5, to = 20, by = .5))) %>% 
  data.frame() %>% 
  ggplot() +
  aes(x = blood.glucose,           # x-axis variable
      y = fit) +                   # y-axis variable
  geom_ribbon(aes(ymin = lower,    # bottom edge of the ribbon
                  ymax = upper),   # top edge of the ribbon
              alpha = .5) +        # ribbon transparency level
  geom_line() +
  theme_bw()

Now that we are basing our ribbon on MANY more points of \(X\), the ribbon is much smoother.

For publication, you would of course want to clean up the plot a bit more:

effects::Effect(focal.predictors = c("blood.glucose"),
                mod = fit_vel_glu,
                xlevels = list(blood.glucose = seq(from = 5, to = 20, by = .5))) %>% 
  data.frame() %>% 
  ggplot() +
  aes(x = blood.glucose,           # x-axis variable
      y = fit) +                   # y-axis variable
  geom_ribbon(aes(ymin = lower,    # bottom edge of the ribbon
                  ymax = upper),   # top edge of the ribbon
              alpha = .3) +        # ribbon transparency level
  geom_line() +
  theme_bw()

The above plot has a ribbon that represents a 95% confidence interval (lower toupper) for the MEAN (fit) outcome. Sometimes we would rather display a ribbon for only the MEAN (fit) plus-or-minus ONE STANDARD ERROR (se) for the mean. You would do that by changing the variables that define the min and max edges of the ribbon (notice the range of the y-axis has changed):

effects::Effect(focal.predictors = c("blood.glucose"),
                mod = fit_vel_glu,
                xlevels = list(blood.glucose = seq(from = 5, to = 20, by = .5))) %>% 
  data.frame() %>% 
  ggplot() +
  aes(x = blood.glucose,           
      y = fit) +                   
  geom_ribbon(aes(ymin = fit - se,    # bottom edge of the ribbon
                  ymax = fit + se),   # top edge of the ribbon
              alpha = .3) +        
  geom_line() +
  theme_bw()   

Of course, you could do both ribbons together:

effects::Effect(focal.predictors = c("blood.glucose"),
                mod = fit_vel_glu,
                xlevels = list(blood.glucose = seq(from = 5, to = 20, by = .5))) %>% 
  data.frame() %>% 
  ggplot() +
  aes(x = blood.glucose,           
      y = fit) +                  
  geom_ribbon(aes(ymin = lower,    # bottom edge of the ribbon = lower of the 95% CI
                  ymax = upper),   # top edge of the ribbon = upper of the 95% CI
              alpha = .3) +        
  geom_ribbon(aes(ymin = fit - se,    # bottom edge of the ribbon = mean - SE
                  ymax = fit + se),   # top edge of the ribbon = Mean + SE
              alpha = .3) +        
  geom_line() +
  theme_bw() 

11.4 CONCLUSION

11.4.1 Methods

11.4.2 Results

There is evidence blood.glucose is associated with short.velocity.

In this sample of Type 1 Diabetic patient (n = 24), there is a moderate positive correlation between fasting blood glucose (M = 10.30 mmol/l, SD = 4.34) and shortening of ventricular velocity (M = 1.33 %/s, SD = 0.23), r = .417, p = .048. An increase of 1 mmol/l fasting blood glucose is associated with a .02 %/s increase in ventricular velocity shortening, b = 0.02, SE = 0.01, p = .048, \(R^2\) = .

11.4.3 Table

You may also present the output in a table using two different packages:

apaSupp::tab_lm(fit_vel_glu,
                var_labels = c("blood.glucose" = "Blood Glucose"),
                caption = "Parameter Estimates for Mean Circumferential Shortening Velocity Regressed on Fasting Blood Glucose",
                general_note = "Units are perceont per second for mean circumferential shortening velocity and mmol per liter for fasting blood glucose.")
Table 11.2
Parameter Estimates for Mean Circumferential Shortening Velocity Regressed on Fasting Blood Glucose

Variable

b

(SE)

p

b*

𝜂²

𝜂ₚ²

(Intercept)

1.10

(0.12)

< .001 ***

Blood Glucose

0.02

(0.01)

.048 *

0.42

.174

.174

0.17

Adjusted R²

0.13

Note. Units are perceont per second for mean circumferential shortening velocity and mmol per liter for fasting blood glucose. b* = standardized estimate. 𝜂² = semi-partial correlation. 𝜂ₚ² = partial correlation.

* p < .05. ** p < .01. *** p < .001.

11.4.4 Plot

effects::Effect(focal.predictors = c("blood.glucose"),
                mod = fit_vel_glu,
                xlevels = list(blood.glucose = seq(from = 5, to = 20, by = .5))) %>% 
  data.frame() %>% 
  ggplot() +
  aes(x = blood.glucose,           
      y = fit) +                   
  geom_ribbon(aes(ymin = fit - se,    # bottom edge of the ribbon
                  ymax = fit + se),   # top edge of the ribbon
              alpha = .3) +        
  geom_line() +
  theme_bw() +
  labs(x = "Fasting Blood Glucose (mmol/l)",
       y = "EStimated Marginal Mean\nMean Circumferential Shortening Velocity (%/s)")