# Chapter 17 Categorical Data and Chi Squared Tests

**Chapter Links**

- Chapter 19-20 Slides: pdf or power point

**Unit Assignment Links**

Unit 6 Writen Part: Skeleton - pdf

Unit 6 R Part: Directions - pdf and Skeleton - Rmd

Unit 6 Reading to Summarize: (Le Marchand et al. 2013) pdf on BOX or online

Inho’s Dataset: Excel

Required Packages

```
library(tidyverse) # Loads several very helpful 'tidy' packages
library(furniture) # Nice tables (by our own Tyson Barrett)
library(pander) # Nice tables in genderal
```

## 17.1 Goodenss of Fit (1-way)

### 17.1.1 Observed Counts vs. Equally Likely Hypothesis

**TEXTBOOK Example:** *Often, especially in an experimental context, the expected frequencies are based on more abstract theoretical considerations. For instance, imagine that a developmental psychologist is studying color preference in toddlers. Each child is told that he or she can take one toy out of four that are offered. All four toys are identical except for color: red, blue, yellow, or green. Forty children are run in the experiment, and their color preferences are as follows: red, 13; blue, 9; yellow, 15; and green, 3. These are the obtained frequencies. The expected frequencies depend on the null hypothesis. If the null hypothesis is that toddlers in general have no preference for color, we would expect the choices of colors to be equally divided among the entire population of toddlers. Hence, the expected frequencies would be 10 for each color.*

Use the `chisq.test()`

function to perform a Goodnes-of-Fit or one-way Chi-Squared test to see if the observed counts are significantly different from being equally distributed.

NOTE:You do not need to declare any options inside the`chisq.test()`

function, as the default is to use equally likely probabilities.

```
# Run the 1-way chi-square test for equally likely
chisq_toy_color <- c(red = 13,
blue = 9,
yellow = 15,
green = 3) %>%
chisq.test() # defaults to Equally likely
```

The following code chunk shows how to create and display a table of the observed and expected counts for any 1-way Chi-squated test.

```
# Request the observed and expected counts
rbind(Observed = chisq_toy_color$observed,
Expected = chisq_toy_color$expected) %>%
pander::pander()
```

red | blue | yellow | green | |
---|---|---|---|---|

Observed |
13 | 9 | 15 | 3 |

Expected |
10 | 10 | 10 | 10 |

To display the full output, type and run the name the model is save as.

```
# Diplay the full output
chisq_toy_color
```

```
Chi-squared test for given probabilities
data: .
X-squared = 8.4, df = 3, p-value = 0.03843
```

### 17.1.2 Observed counts vs. Hypothesised Probabilities

**TEXTBOOK Example:** *Imagine that the population of a city is made up of three ethnic groups, which I will label A, B, and C. The obtained frequencies were 28, 18, and 2. You could test the null hypothesis that sample is representatve of a population proportions which is half group A and a third group B.*

The `chisq.test()`

function may also be used to perform a Goodnes-of-Fit or one-way Chi-Squared test to see if the observed counts are significantly different from thoes expected from a set of hypothesised probabilies.

NOTE:YouDOneed to declare the probabilities, as the default is to use equally likely probabilities. You may do this by including`p = c(`

\(p_1\)`,`

\(p_2\)`, ...,`

\(p_k\)`)`

within the`chisq.test()`

function. The \(p_i\)’s maybe typed as decimals or fractions, but make suer they add up to exactly \(1\)!

```
# Run the 1-way chi-square test for hypothesised probabilityes
chisq_ethnic <- c(A = 28,
B = 18,
C = 2) %>%
chisq.test(p = c(1/2, 1/3, 1/6)) # declare the probabilities
```

Use the same code chunk to display a table of the observed and expected counts for any 1-way Chi-squated test.

HINTYou maycopy-and-pastethis code for the rest of the assignment, but make sure to change the name of the model (`chisq_ethnic`

appears twice before the $-sign).

```
# Request the observed and expected counts
rbind(Observed = chisq_ethnic$observed,
Expected = chisq_ethnic$expected) %>%
pander::pander()
```

A | B | C | |
---|---|---|---|

Observed |
28 | 18 | 2 |

Expected |
24 | 16 | 8 |

To display the full output, type and run the name the model is save as.

```
# Diplay the full output
chisq_ethnic
```

```
Chi-squared test for given probabilities
data: .
X-squared = 5.4167, df = 2, p-value = 0.06665
```

## 17.2 Test for Independence (2-way) - vs. Association

**TEXTBOOK Example:** *Suppose that the researcher has interviewed 30 women who have been married: 10 whose parents were divorced and 20 whose parents were married. Half of the 30 women in this hypothetical study have gone through their own divorce; the other half are still married for the first time. To know whether the divorce of a person’s parents makes the person more likely to divorce, we need to see the breakdown in each category- that is, how many currently divorced women come from “broken” homes and how many do not, and similarly for those still married. These frequency data are generally presented in a contingency (or cross-classification) table:*

The dataset needs to be declared a table before you can run a Chi-Squared Test

```
# Store the data as a table
woman_parents <- data.frame(home_broken = c(7, 3),
home_complete = c(8, 12),
row.names = c("self_divorced", "self_married")) %>%
as.matrix() %>%
as.table()
```

```
# Display the observed counts
woman_parents %>%
addmargins() %>%
pander::pander()
```

home_broken | home_complete | Sum | |
---|---|---|---|

self_divorced |
7 | 8 | 15 |

self_married |
3 | 12 | 15 |

Sum |
10 | 20 | 30 |

The `chisq.test()`

function may also be used to perform a two-way Chi-Squared test for independence. In this case, the observed counts are compared to thoes expected if there is no association between the two factors.

```
# Run the 2-way chi-square test for independence
chisq_divorces <- woman_parents %>%
chisq.test(correct = FALSE) #IF 2x2, add correct = FALSE
```

To display the counts expected if the variables are independent, start with the model name and add `$expected`

at the end. Then pipe on both the `addmargins()`

and `pander::pander()`

functions to print the counts.

```
# Request the expected counts based on "no association"
chisq_divorces$expected %>%
pander::pander()
```

home_broken | home_complete | |
---|---|---|

self_divorced |
5 | 10 |

self_married |
5 | 10 |

To display the full output, type and run the name the model is save as.

```
# Diplay the full output
chisq_divorces
```

```
Pearson's Chi-squared test
data: .
X-squared = 2.4, df = 1, p-value = 0.1213
```