16 GEE, Binary Outcome: Respiratory Illness

16.1 PREPARATION

16.1.1 Load Packages

install.packages("devtools")
devtools::install_github("sarbearschwartz/apaSupp") # updated 10/14/2025

Load/activate these packages

library(tidyverse)    
library(flextable)
library(apaSupp)      # Not on CRAN, on GitHub (see above)
library(psych) 
library(lme4)         
library(lmerTest)
library(corrplot)      
library(geepack)       
library(MuMIn)  
library(interactions)
library(performance)      

library(HSAUR)        # package with the dataset

16.1.2 Background

This dataset was used as an example in Chapter 11 of “A Handbook of Statistical Analysis using R” by Brian S. Everitt and Torsten Hothorn. The authors include this data set in their HSAUR package on CRAN.

The Background

In each of two centers, eligible patients were randomly assigned to active treatment or placebo. During the treatment, the respiratory status (categorized poor or good) was determined at each of four, monthly visits. The trial recruited 111 participants (54 in the active group, 57 in the placebo group) and there were no missing data for either the responses or the covariates.

The Research Question

The question of interest is to assess whether the treatment is effective and to estimate its effect.

Note: We are NOT interested in change over time, but rather mean differences in the treatment group compared to the placebo group, net of any potential confounding due to age, sex, and site.

The Data

Note that the data (555 observations on the following 7 variables) are in long form, i.e, repeated measurements are stored as additional rows in the data frame.

  • Indicators

  • subject the patient ID, a factor with levels 1 to 111

  • centre the study center, a factor with levels 1 and 2

  • month the month, each patient was examined at months 0, 1, 2, 3 and 4

  • Outcome or dependent variable

  • status the respiratory status (response variable), a factor with levels poor and good

  • Main predictor or independent variable of interest

  • treatment the treatment arm, a factor with levels placebo and treatment

  • Time-invariant Covariates to control for

  • sex a factor with levels female and male

  • age the age of the patient

16.1.3 Read in the data

data("respiratory", package = "HSAUR")

str(respiratory)
'data.frame':   555 obs. of  7 variables:
 $ centre   : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ treatment: Factor w/ 2 levels "placebo","treatment": 1 1 1 1 1 1 1 1 1 1 ...
 $ sex      : Factor w/ 2 levels "female","male": 1 1 1 1 1 1 1 1 1 1 ...
 $ age      : num  46 46 46 46 46 28 28 28 28 28 ...
 $ status   : Factor w/ 2 levels "poor","good": 1 1 1 1 1 1 1 1 1 1 ...
 $ month    : Ord.factor w/ 5 levels "0"<"1"<"2"<"3"<..: 1 2 3 4 5 1 2 3 4 5 ...
 $ subject  : Factor w/ 111 levels "1","2","3","4",..: 1 1 1 1 1 2 2 2 2 2 ...
tibble::glimpse(respiratory)
Rows: 555
Columns: 7
$ centre    <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ treatment <fct> placebo, placebo, placebo, placebo, placebo, placebo, placeb…
$ sex       <fct> female, female, female, female, female, female, female, fema…
$ age       <dbl> 46, 46, 46, 46, 46, 28, 28, 28, 28, 28, 23, 23, 23, 23, 23, …
$ status    <fct> poor, poor, poor, poor, poor, poor, poor, poor, poor, poor, …
$ month     <ord> 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, …
$ subject   <fct> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, …

16.1.4 Wide Format

Wide format has one line per participant.

df_resp_wide <- respiratory %>% 
  tidyr::spread(key = month,
                value = status,
                sep = "_") %>% 
  dplyr::rename("BL_status" = "month_0") %>% 
  dplyr::arrange(subject) %>% 
  dplyr::select(subject, centre, 
                sex, age, treatment, 
                BL_status, starts_with("month")) %>% 
  dplyr::mutate_if(is.factor, ~forcats::fct_relabel(.x, stringr::str_to_title))
tibble::glimpse(df_resp_wide)
Rows: 111
Columns: 10
$ subject   <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
$ centre    <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ sex       <fct> Female, Female, Female, Female, Male, Female, Female, Female…
$ age       <dbl> 46, 28, 23, 44, 13, 34, 43, 28, 31, 37, 30, 14, 23, 30, 20, …
$ treatment <fct> Placebo, Placebo, Treatment, Placebo, Placebo, Treatment, Pl…
$ BL_status <fct> Poor, Poor, Good, Good, Good, Poor, Poor, Poor, Good, Good, …
$ month_1   <fct> Poor, Poor, Good, Good, Good, Poor, Good, Poor, Good, Poor, …
$ month_2   <fct> Poor, Poor, Good, Good, Good, Poor, Poor, Poor, Good, Good, …
$ month_3   <fct> Poor, Poor, Good, Good, Good, Poor, Good, Poor, Good, Good, …
$ month_4   <fct> Poor, Poor, Good, Poor, Good, Poor, Good, Poor, Good, Poor, …
df_resp_wide %>% 
  psych::headTail(top = 10, bottom = 10) %>% 
  flextable::flextable() %>% 
  apaSupp::theme_apa(caption = "Wide Format")
Table 16.1
Wide Format

subject

centre

sex

age

treatment

BL_status

month_1

month_2

month_3

month_4

1

1

Female

46

Placebo

Poor

Poor

Poor

Poor

Poor

2

1

Female

28

Placebo

Poor

Poor

Poor

Poor

Poor

3

1

Female

23

Treatment

Good

Good

Good

Good

Good

4

1

Female

44

Placebo

Good

Good

Good

Good

Poor

5

1

Male

13

Placebo

Good

Good

Good

Good

Good

6

1

Female

34

Treatment

Poor

Poor

Poor

Poor

Poor

7

1

Female

43

Placebo

Poor

Good

Poor

Good

Good

8

1

Female

28

Treatment

Poor

Poor

Poor

Poor

Poor

9

1

Female

31

Treatment

Good

Good

Good

Good

Good

10

1

Female

37

Placebo

Good

Poor

Good

Good

Poor

...

102

2

Female

48

Placebo

Good

Good

Poor

Poor

Poor

103

2

Female

20

Treatment

Poor

Good

Good

Good

Good

104

2

Male

39

Placebo

Good

Poor

Good

Poor

Poor

105

2

Female

28

Treatment

Poor

Good

Poor

Poor

Poor

106

2

Male

38

Placebo

Poor

Poor

Poor

Poor

Poor

107

2

Female

43

Treatment

Good

Good

Good

Good

Good

108

2

Male

39

Treatment

Poor

Good

Good

Good

Good

109

2

Female

68

Treatment

Poor

Good

Good

Good

Good

110

2

Male

63

Treatment

Good

Good

Good

Good

Good

111

2

Female

31

Treatment

Good

Good

Good

Good

Good

16.1.5 Long Format

Long format has one line per observation.

df_resp_long <- df_resp_wide%>% 
  tidyr::gather(key = month,
                value = status,
                starts_with("month")) %>% 
  dplyr::mutate(month = str_sub(month, start = -1) %>% as.numeric) %>% 
  dplyr::mutate(status = case_when(status == "Poor" ~ 0,
                                   status == "Good" ~ 1)) %>% 
  dplyr::arrange(subject, month) %>% 
  dplyr::select(subject, centre, sex, age, treatment, BL_status, month, status)
tibble::glimpse(df_resp_long)
Rows: 444
Columns: 8
$ subject   <fct> 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, …
$ centre    <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ sex       <fct> Female, Female, Female, Female, Female, Female, Female, Fema…
$ age       <dbl> 46, 46, 46, 46, 28, 28, 28, 28, 23, 23, 23, 23, 44, 44, 44, …
$ treatment <fct> Placebo, Placebo, Placebo, Placebo, Placebo, Placebo, Placeb…
$ BL_status <fct> Poor, Poor, Poor, Poor, Poor, Poor, Poor, Poor, Good, Good, …
$ month     <dbl> 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, …
$ status    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, …
df_resp_long %>% 
  psych::headTail(top = 10, bottom = 10) %>% 
  flextable::flextable() %>% 
  apaSupp::theme_apa(caption = "Long Format")
Table 16.2
Long Format

subject

centre

sex

age

treatment

BL_status

month

status

1

1

Female

46

Placebo

Poor

1

0

1

1

Female

46

Placebo

Poor

2

0

1

1

Female

46

Placebo

Poor

3

0

1

1

Female

46

Placebo

Poor

4

0

2

1

Female

28

Placebo

Poor

1

0

2

1

Female

28

Placebo

Poor

2

0

2

1

Female

28

Placebo

Poor

3

0

2

1

Female

28

Placebo

Poor

4

0

3

1

Female

23

Treatment

Good

1

1

3

1

Female

23

Treatment

Good

2

1

...

...

...

109

2

Female

68

Treatment

Poor

3

1

109

2

Female

68

Treatment

Poor

4

1

110

2

Male

63

Treatment

Good

1

1

110

2

Male

63

Treatment

Good

2

1

110

2

Male

63

Treatment

Good

3

1

110

2

Male

63

Treatment

Good

4

1

111

2

Female

31

Treatment

Good

1

1

111

2

Female

31

Treatment

Good

2

1

111

2

Female

31

Treatment

Good

3

1

111

2

Female

31

Treatment

Good

4

1

16.2 EXPLORATORY DATA ANLAYSIS

16.2.1 Summary Statistics

16.2.1.1 Demographics and Baseline Measure

Notice that numerical summaries are computed for all variables, even the categorical variables (i.e. factors). The have an * after the variable name to remind you that the mean, sd, and se are of limited use.

Notice: the mean age is 33

df_resp_wide %>% 
  apaSupp::tab_desc(caption = "Description of Participant Age")
Table 16.3
Description of Participant Age

NA

M

SD

min

Q1

Mdn

Q3

max

age

0

33.28

13.65

11.00

23.00

31.00

43.00

68.00

Note. N = 111. NA = not available or missing; Mdn = median; Q1 = 25th percentile; Q3 = 75th percentile.

df_resp_wide %>%  
  dplyr::select(treatment,
                "Center" = centre, 
                "Sex" = sex, 
                "Age" = age, 
                "Baseline Status" = BL_status) %>% 
  apaSupp::tab1(split = "treatment",
                caption = "Participant Demographics",
                p_note = "apa1")
Table 16.4
Participant Demographics

Total
N = 111

Placebo
n = 57 (51.4%)

Treatment
n = 54 (48.6%)

p-value

Center

1.000

1

56 (50.5%)

29 (50.9%)

27 (50.0%)

2

55 (49.5%)

28 (49.1%)

27 (50.0%)

Sex

.028*

Female

88 (79.3%)

40 (70.2%)

48 (88.9%)

Male

23 (20.7%)

17 (29.8%)

6 (11.1%)

Age

33.28 (13.65)

33.65 (13.45)

32.89 (13.98)

.771

Baseline Status

1.000

Poor

61 (55.0%)

31 (54.4%)

30 (55.6%)

Good

50 (45.0%)

26 (45.6%)

24 (44.4%)

Note. Continuous variables are summarized with means (SD) and significant group differences assessed via independent t-tests. Categorical variables are summarized with counts (%) and significant group differences assessed via Chi-squared tests for independence.

* p < .05.

16.2.1.2 Status Over Time

df_resp_wide %>% 
  dplyr::select(treatment,
                "Month One" = month_1, 
                "Month Two" = month_2, 
                "Month Three" = month_3, 
                "Month Four" = month_4) %>% 
  apaSupp::tab1(split = "treatment",
                caption = "Respiratory Status Over Time",
                p_note = "apa2")
Table 16.5
Respiratory Status Over Time

Total
N = 111

Placebo
n = 57 (51.4%)

Treatment
n = 54 (48.6%)

p-value

Month One

.060

Poor

46 (41.4%)

29 (50.9%)

17 (31.5%)

Good

65 (58.6%)

28 (49.1%)

37 (68.5%)

Month Two

.002**

Poor

51 (45.9%)

35 (61.4%)

16 (29.6%)

Good

60 (54.1%)

22 (38.6%)

38 (70.4%)

Month Three

.008**

Poor

46 (41.4%)

31 (54.4%)

15 (27.8%)

Good

65 (58.6%)

26 (45.6%)

39 (72.2%)

Month Four

.068

Poor

52 (46.8%)

32 (56.1%)

20 (37.0%)

Good

59 (53.2%)

25 (43.9%)

34 (63.0%)

Note. . .

** p < .01.

Correlation between repeated observations:

df_resp_wide %>% 
  dplyr::select(starts_with("month")) %>% 
  dplyr::mutate_all(function(x) x == "Good") %>% 
  cor() %>% 
  corrplot::corrplot.mixed()

16.2.2 Visualization

16.2.2.1 Age

df_resp_wide %>% 
  apaSupp::spicy_histos(age,
                        split = treatment,
                        lab = "Age in Years")

16.2.2.2 Status by Age

df_resp_wide %>% 
  dplyr::mutate(n_good = furniture::rowsums(month_1 == "Good", 
                                            month_2 == "Good",
                                            month_3 == "Good",
                                            month_4 == "Good")) %>% 
  ggplot(aes(x = age,
             y = n_good)) +
  geom_count() +
  geom_smooth() +
  theme_bw() +
  labs(x = "Age in Years",
       y = "Number of Visits out of Four\nwith 'Good' Respiration")

16.2.2.3 Status Over Time

It appears that status is fairly constant over time.

df_resp_long %>% 
  dplyr::group_by(treatment, month) %>% 
  dplyr::summarise(n = n(),
                   prop_good = mean(status),
                   prop_sd = sd(status),
                   prop_se = prop_sd/sqrt(n)) %>% 
  ggplot(aes(x = month,
             y = prop_good,
             group = treatment,
             color = treatment)) +
  geom_errorbar(aes(ymin = prop_good - prop_se,
                    ymax = prop_good + prop_se),
                width = .25,
                show.legend = FALSE) +
  geom_point(aes(shape = treatment),
             size = 2) +
  geom_line(aes(linetype = treatment)) +
  theme_bw() +
  labs(x = "Time, months",
       y = "Observed Proportion of Participants\nwith 'Good' Respiratory Status",
       color = NULL,
       shape = NULL,
       linetype = NULL) +
  scale_linetype_manual(values = c("solid", "longdash")) +
  theme(legend.position = "bottom",
        legend.key.width = unit(1.5, "cm"))

It is NOT the purpose of this analysis to investigate change over time!

Since status is largely stable over time, no linear (or even quadratic) effect of the month variable will be modeled.

Instead, the four observations on each subject are treated as correlated (at least with non-independent correlation structure in GEE), but no time trend will be included.

16.3 Generalized Estimating Equations (GEE)

Again, since participants age is likely to be a risk at either end of the spectrum, the potential quadratic effect of age will be modeled. Age is also being grand-mean centered to make the intercept more meaningful.

16.3.1 Indepdendence

Using an "independence" correlation structure is equivalent to using a GLM analysis (logistic regression in this case) and is never appropriate for repeated measures data. It is only being done here for comparison purposes.

fit_geeglm_in <- geepack::geeglm(status ~ centre + treatment + sex + BL_status + 
                                   I(age-33) + I((age-33)^2),
                                 data = df_resp_long,
                                 family = binomial(link = "logit"),
                                 id = subject,
                                 waves = month,
                                 corstr = "independence",
                                 scale.fix = TRUE)

summary(fit_geeglm_in)

Call:
geepack::geeglm(formula = status ~ centre + treatment + sex + 
    BL_status + I(age - 33) + I((age - 33)^2), family = binomial(link = "logit"), 
    data = df_resp_long, id = subject, waves = month, corstr = "independence", 
    scale.fix = TRUE)

 Coefficients:
                     Estimate    Std.err   Wald Pr(>|W|)    
(Intercept)        -1.9685725  0.4457014 19.508 1.00e-05 ***
centre2             0.5347938  0.3795760  1.985 0.158858    
treatmentTreatment  1.3561814  0.3777999 12.886 0.000331 ***
sexMale             0.4263433  0.4832337  0.778 0.377630    
BL_statusGood       1.9193401  0.3772812 25.881 3.63e-07 ***
I(age - 33)        -0.0368535  0.0150120  6.027 0.014091 *  
I((age - 33)^2)     0.0025169  0.0007592 10.989 0.000917 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation structure = independence 
Scale is fixed.

Number of clusters:   111  Maximum cluster size: 4 

The results for GEE fit with the independence correlation structure produces results that are nearly identical to the GLM model.

16.3.2 Exchangeable

fit_geeglm_ex <- geepack::geeglm(status ~ centre + treatment + sex + BL_status + 
                                   I(age-33) + I((age-33)^2),
                                 data = df_resp_long,
                                 family = binomial(link = "logit"),
                                 id = subject,
                                 waves = month,
                                 corstr = "exchangeable",
                                 scale.fix = TRUE)

summary(fit_geeglm_ex)

Call:
geepack::geeglm(formula = status ~ centre + treatment + sex + 
    BL_status + I(age - 33) + I((age - 33)^2), family = binomial(link = "logit"), 
    data = df_resp_long, id = subject, waves = month, corstr = "exchangeable", 
    scale.fix = TRUE)

 Coefficients:
                    Estimate   Std.err  Wald Pr(>|W|)    
(Intercept)        -1.968572  0.445701 19.51  1.0e-05 ***
centre2             0.534794  0.379576  1.99  0.15886    
treatmentTreatment  1.356181  0.377800 12.89  0.00033 ***
sexMale             0.426343  0.483234  0.78  0.37763    
BL_statusGood       1.919340  0.377281 25.88  3.6e-07 ***
I(age - 33)        -0.036854  0.015012  6.03  0.01409 *  
I((age - 33)^2)     0.002517  0.000759 10.99  0.00092 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation structure = exchangeable 
Scale is fixed.

  Link = identity 

Estimated Correlation Parameters:
      Estimate Std.err
alpha    0.312  0.0521
Number of clusters:   111  Maximum cluster size: 4 

16.3.3 Auto-Regressive

fit_geeglm_ar <- geepack::geeglm(status ~ centre + treatment + sex + BL_status + 
                                   I(age-33) + I((age-33)^2),
                                 data = df_resp_long,
                                 family = binomial(link = "logit"),
                                 id = subject,
                                 waves = month,
                                 corstr = "ar1",
                                 scale.fix = TRUE)

summary(fit_geeglm_ar)

Call:
geepack::geeglm(formula = status ~ centre + treatment + sex + 
    BL_status + I(age - 33) + I((age - 33)^2), family = binomial(link = "logit"), 
    data = df_resp_long, id = subject, waves = month, corstr = "ar1", 
    scale.fix = TRUE)

 Coefficients:
                    Estimate   Std.err  Wald Pr(>|W|)    
(Intercept)        -1.949751  0.451399 18.66  1.6e-05 ***
centre2             0.627506  0.375800  2.79  0.09496 .  
treatmentTreatment  1.286413  0.377526 11.61  0.00066 ***
sexMale             0.388172  0.485677  0.64  0.42415    
BL_statusGood       1.942647  0.374568 26.90  2.1e-07 ***
I(age - 33)        -0.033646  0.014935  5.08  0.02427 *  
I((age - 33)^2)     0.002306  0.000755  9.32  0.00227 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation structure = ar1 
Scale is fixed.

  Link = identity 

Estimated Correlation Parameters:
      Estimate Std.err
alpha    0.428  0.0552
Number of clusters:   111  Maximum cluster size: 4 

16.3.4 Paramgeter Estimates Table

apaSupp::tab_gees(list("Independent" = fit_geeglm_in,
                       "Exchangable" = fit_geeglm_ex,
                       "Autoregress" = fit_geeglm_ar),
                  narrow = TRUE)
Table 16.6
Compare GEE Models

Independent

Exchangable

Autoregress

OR

95% CI

OR

95% CI

OR

95% CI

centre

1

2

1.71

[0.81, 3.59]

1.71

[0.81, 3.59]

1.87

[0.90, 3.91]

treatment

Placebo

Treatment

3.88

[1.85, 8.14]***

3.88

[1.85, 8.14]***

3.62

[1.73, 7.59]***

sex

Female

Male

1.53

[0.59, 3.95]

1.53

[0.59, 3.95]

1.47

[0.57, 3.82]

BL_status

Poor

Good

6.82

[3.25, 14.28]***

6.82

[3.25, 14.28]***

6.98

[3.35, 14.54]***

I(age - 33)

0.96

[0.94, 0.99]*

0.96

[0.94, 0.99]*

0.97

[0.94, 1.00]*

I((age - 33)^2)

1.00

[1.00, 1.00]***

1.00

[1.00, 1.00]***

1.00

[1.00, 1.00]**

Note.

* p < .05. ** p < .01. *** p < .001.

16.3.5 Compare Models

MuMIn::QIC(fit_geeglm_in, 
           fit_geeglm_ex, 
           fit_geeglm_ar, 
           typeR = FALSE)
              QIC
fit_geeglm_in 491
fit_geeglm_ex 491
fit_geeglm_ar 492

16.3.5.1 Manual Extraction

fit_geeglm_ex %>% coef() %>% exp()
       (Intercept)            centre2 treatmentTreatment            sexMale 
             0.140              1.707              3.881              1.532 
     BL_statusGood        I(age - 33)    I((age - 33)^2) 
             6.816              0.964              1.003 
trt_EST <- summary(fit_geeglm_ex)$coeff["treatmentTreatment", "Estimate"] 
trt_EST
[1] 1.36
exp(trt_EST)
[1] 3.88
Trt_SE <- summary(fit_geeglm_ex)$coeff["treatmentTreatment", "Std.err"] 
Trt_SE
[1] 0.378
trt_95ci <- trt_EST + c(-1, +1)*1.96*Trt_SE
trt_95ci
[1] 0.616 2.097
exp(trt_95ci)
[1] 1.85 8.14

16.3.6 Final Model

16.3.6.1 Estimates on both the logit and odds-ratio scales

apaSupp::tab_gee(fit_geeglm_ex,
                 caption = "GEE: Final Model (exchangable)")
Table 16.7
GEE: Final Model (exchangable)

Odds Ratio

Logit Scale

OR

95% CI

b

(SE)

p

centre

1

2

1.71

[0.81, 3.59]

0.53

(0.38)

.159

treatment

Placebo

Treatment

3.88

[1.85, 8.14]

1.36

(0.38)

< .001***

sex

Female

Male

1.53

[0.59, 3.95]

0.43

(0.48)

.378

BL_status

Poor

Good

6.82

[3.25, 14.28]

1.92

(0.38)

< .001***

I(age - 33)

0.96

[0.94, 0.99]

-0.04

(0.02)

.014*

I((age - 33)^2)

1.00

[1.00, 1.00]

0.00

(0.00)

< .001***

(Intercept)

-1.97

(0.45)

< .001***

Note. N = 444 observations on 111 participants. Correlation structure: exchangeable

* p < .05. ** p < .01. *** p < .001.

16.3.6.2 Interpretation

  • centre: Controlling for baseline status, sex, age, and treatment, participants at the two centers did not significantly differ in respiratory status during the intervention

  • sex: Controlling for baseline status, center, age, and treatment, a participant’s respiratory status did not differ between the two sexes.

  • BL_status: Controlling for sex, center, age, and treatment, those with good baseline status had nearly 7 times higher odds of having a good respiratory status, compared to participants that starts out poor.

  • age: Controlling for baseline status, sex, center, and treatment, the role of age was non-linear, such that the odds of a good respiratory status was lowest for patients age 40 and better for those that were either younger or older.

Most importantly:

  • treatment: Controlling for baseline status, sex, age, and center, those on the treatment had 3.88 time higher odds of having a good respiratory status, compared to similar participants that were randomized to the placebo.

16.3.7 Predicted Probabilities

16.3.7.1 Make predictions

What is the change a 40 year old man in poor condition at center 1 change of being rated as being in “Good” respiratory condition?

fit_geeglm_ex %>% 
  emmeans::emmeans(pairwise ~ treatment,
                   at = list(centre = "1",
                             sex = "Male",
                             age = 40,
                             BL_status = "Poor"),
                   type = "response")
$emmeans
 treatment  prob     SE  df lower.CL upper.CL
 Placebo   0.158 0.0548 437   0.0767    0.296
 Treatment 0.421 0.1160 437   0.2211    0.650

Covariance estimate used: vbeta 
Confidence level used: 0.95 
Intervals are back-transformed from the logit scale 

$contrasts
 contrast            odds.ratio     SE  df null t.ratio p.value
 Placebo / Treatment      0.258 0.0973 437    1  -3.590  0.0004

Tests are performed on the log odds ratio scale 

A 40 year old man in poor condition at center 1 has a 15.8% change of being rated as being in “Good” respiratory condition if he was randomized to placebo.

A 40 year old man in poor condition at center 1 has a 42.1% change of being rated as being in “Good” respiratory condition if he was randomized to placebo.

The odds ratio for treatment is:

(.421/(1 -.421) )/( .158/(1 - .158))
[1] 3.87

16.3.8 Marginal Plot to Visualize the Model

16.3.8.1 Quickest Version

The interactions::interact_plot() function can only investigate 3 variables at once:

  • pred the x-axis (must be continuous)
  • modx different lines (may be categorical or continuous)
  • mod2 side-by-side panels (may be categorical or continuous)

All other variables must be held constant.

interactions::interact_plot(model = fit_geeglm_ex,   # model name
                            pred = age,               # x-axis
                            modx = treatment,         # lines
                            mod2 = sex,               # panels
                            data = df_resp_long,
                            at = list(centre = "1",
                                      BL_status = "Good"),  # hold constant
                            type = "mean_subject") +
  theme_bw()

16.3.8.2 Full Version

Makes a dataset with all combinations

fit_geeglm_ex %>% 
  emmeans::emmeans(~ centre + treatment + sex + age + BL_status,
                   at = list(age = c(20, 35, 50)),
                   type = "response") %>% 
  data.frame()
   centre treatment    sex age BL_status  prob     SE  df lower.CL upper.CL
1       1   Placebo Female  20      Poor 0.257 0.0655 437   0.1494    0.404
2       2   Placebo Female  20      Poor 0.371 0.1211 437   0.1751    0.620
3       1 Treatment Female  20      Poor 0.572 0.0709 437   0.4310    0.703
4       2 Treatment Female  20      Poor 0.696 0.0901 437   0.4975    0.841
5       1   Placebo   Male  20      Poor 0.346 0.1033 437   0.1773    0.565
6       2   Placebo   Male  20      Poor 0.474 0.1190 437   0.2610    0.697
7       1 Treatment   Male  20      Poor 0.672 0.1335 437   0.3840    0.871
8       2 Treatment   Male  20      Poor 0.778 0.0996 437   0.5301    0.916
9       1   Placebo Female  35      Poor 0.116 0.0472 437   0.0504    0.245
10      2   Placebo Female  35      Poor 0.183 0.0916 437   0.0628    0.427
11      1 Treatment Female  35      Poor 0.337 0.0661 437   0.2214    0.476
12      2 Treatment Female  35      Poor 0.465 0.1104 437   0.2662    0.675
13      1   Placebo   Male  35      Poor 0.167 0.0567 437   0.0827    0.309
14      2   Placebo   Male  35      Poor 0.255 0.0845 437   0.1252    0.451
15      1 Treatment   Male  35      Poor 0.438 0.1191 437   0.2313    0.669
16      2 Treatment   Male  35      Poor 0.571 0.1126 437   0.3501    0.767
17      1   Placebo Female  50      Poor 0.134 0.0588 437   0.0539    0.295
18      2   Placebo Female  50      Poor 0.209 0.1013 437   0.0732    0.468
19      1 Treatment Female  50      Poor 0.375 0.0844 437   0.2281    0.549
20      2 Treatment Female  50      Poor 0.506 0.1102 437   0.3008    0.709
21      1   Placebo   Male  50      Poor 0.191 0.0674 437   0.0913    0.358
22      2   Placebo   Male  50      Poor 0.288 0.0862 437   0.1502    0.480
23      1 Treatment   Male  50      Poor 0.479 0.1261 437   0.2539    0.713
24      2 Treatment   Male  50      Poor 0.611 0.1030 437   0.4009    0.786
25      1   Placebo Female  20      Good 0.702 0.0736 437   0.5410    0.824
26      2   Placebo Female  20      Good 0.801 0.0626 437   0.6501    0.897
27      1 Treatment Female  20      Good 0.901 0.0408 437   0.7874    0.957
28      2 Treatment Female  20      Good 0.940 0.0245 437   0.8696    0.973
29      1   Placebo   Male  20      Good 0.783 0.1022 437   0.5252    0.921
30      2   Placebo   Male  20      Good 0.860 0.0612 437   0.6936    0.944
31      1 Treatment   Male  20      Good 0.933 0.0499 437   0.7436    0.985
32      2 Treatment   Male  20      Good 0.960 0.0268 437   0.8588    0.989
33      1   Placebo Female  35      Good 0.472 0.0943 437   0.2980    0.653
34      2   Placebo Female  35      Good 0.604 0.1031 437   0.3953    0.781
35      1 Treatment Female  35      Good 0.776 0.0650 437   0.6244    0.879
36      2 Treatment Female  35      Good 0.855 0.0442 437   0.7456    0.923
37      1   Placebo   Male  35      Good 0.578 0.1205 437   0.3414    0.783
38      2   Placebo   Male  35      Good 0.700 0.0824 437   0.5192    0.835
39      1 Treatment   Male  35      Good 0.842 0.0878 437   0.5929    0.951
40      2 Treatment   Male  35      Good 0.901 0.0481 437   0.7592    0.963
41      1   Placebo Female  50      Good 0.513 0.1108 437   0.3057    0.716
42      2   Placebo Female  50      Good 0.643 0.1013 437   0.4303    0.811
43      1 Treatment Female  50      Good 0.803 0.0689 437   0.6343    0.906
44      2 Treatment Female  50      Good 0.875 0.0401 437   0.7728    0.935
45      1   Placebo   Male  50      Good 0.617 0.1242 437   0.3646    0.819
46      2   Placebo   Male  50      Good 0.734 0.0736 437   0.5678    0.852
47      1 Treatment   Male  50      Good 0.862 0.0808 437   0.6219    0.960
48      2 Treatment   Male  50      Good 0.914 0.0409 437   0.7927    0.968

In order to include all FIVE variables, we must do it the LONG way…

fit_geeglm_ex %>% 
  emmeans::emmeans(~ centre + treatment + sex + age + BL_status,
                   at = list(age = seq(from = 11, to = 68, by = 1)),
                   type = "response",
                   level = .68) %>% 
  data.frame() %>% 
  ggplot(aes(x = age,
             y = prob,
             group = interaction(sex, treatment))) +
  geom_ribbon(aes(ymin = lower.CL,
                  ymax = upper.CL,
                  fill = fct_rev(sex)),
              alpha = .3) +
  geom_line(aes(color = fct_rev(sex),
                linetype = fct_rev(treatment))) +
  theme_bw() + 
  facet_grid(centre ~ BL_status, labeller = label_both) +
  labs(x = "Age, years",
       y = "Predicted Probability of GOOD Respiratory Status",
       color = "Sex:",
       fill = "Sex:",
       linetype = "Assignment:") +
  theme(legend.position = "bottom")

16.3.8.3 Females in Center 1

This example uses default settings.

interactions::interact_plot(model = fit_geeglm_ex,
                            pred = age,
                            modx = treatment,
                            mod2 = BL_status)  +
  theme_bw()

interactions::interact_plot(model = fit_geeglm_ex,
                            pred = age,
                            modx = treatment,
                            mod2 = BL_status,
                            at = list(sex = "Female",
                                      centre = "1"))  +
  theme_bw()

16.3.8.4 Males in Center 2

This example is more preped for publication.

interactions::interact_plot(model = fit_geeglm_ex,
                            pred = age,
                            modx = treatment,
                            mod2 = BL_status,
                            at = list(sex = "Male",
                                      centre = "2"),
                            x.label = "Age in Years",
                            y.label = "Predicted Probability of 'Good' Respiratory Status",
                            legend.main = "Intervention: ",
                            mod2.labels = c("Poor at Baseline",
                                            "Good at Baseline"),
                            colors = rep("black", times = 2)) +
  theme_bw() +
  theme(legend.position = c(1, 0),
        legend.justification = c(1.1, -0.1),
        legend.background = element_rect(color = "black"),
        legend.key.width = unit(1.5, "cm")) +
  labs(caption = "Note: Probibilities shown are specific to males at center 2")

fit_geeglm_ex %>% 
  emmeans::emmeans(~ centre + treatment + sex + age + BL_status,
                   at = list(age = seq(from = 11, to = 68, by = 1),
                             sex = "Male",
                             centre = "2"),
                   type = "response",
                   level = .68) %>% 
  data.frame() %>% 
  ggplot(aes(x = age,
             y = prob,
             group = interaction(sex, treatment))) +
  geom_ribbon(aes(ymin = lower.CL,
                  ymax = upper.CL),
              alpha = .2) +
  geom_line(aes(linetype = fct_rev(treatment))) +
  theme_bw() + 
  facet_grid(~ BL_status) +
  labs(x = "Age, years",
       y = "Predicted Probability of\nGOOD Respiratory Status",
       color = "Sex:",
       fill = "Sex:",
       linetype = "Assignment:") +
  theme(legend.position = "bottom")

16.4 Conclusion

The Research Question

The question of interest is to assess whether the treatment is effective and to estimate its effect.

The Conclusion

After accounting for baseline status, age, sex and center, participants in the active treatment group had nearly four times higher odds of having ‘good’ respiratory status, when compared to the placebo, exp(b) = 3.881, p<.001, 95% CI [1.85, 8.14].