Chapter 17 Categorical Data and Chi Squared Tests
Chapter Links
- Chapter 19-20 Slides: pdf or power point
Unit Assignment Links
Unit 6 Writen Part: Skeleton - pdf
Unit 6 R Part: Directions - pdf and Skeleton - Rmd
Unit 6 Reading to Summarize: (Le Marchand et al. 2013) pdf on BOX or online
Inho’s Dataset: Excel
Required Packages
library(tidyverse) # Loads several very helpful 'tidy' packages
library(furniture) # Nice tables (by our own Tyson Barrett)
library(pander) # Nice tables in genderal
17.1 Goodenss of Fit (1-way)
17.1.1 Observed Counts vs. Equally Likely Hypothesis
TEXTBOOK Example: Often, especially in an experimental context, the expected frequencies are based on more abstract theoretical considerations. For instance, imagine that a developmental psychologist is studying color preference in toddlers. Each child is told that he or she can take one toy out of four that are offered. All four toys are identical except for color: red, blue, yellow, or green. Forty children are run in the experiment, and their color preferences are as follows: red, 13; blue, 9; yellow, 15; and green, 3. These are the obtained frequencies. The expected frequencies depend on the null hypothesis. If the null hypothesis is that toddlers in general have no preference for color, we would expect the choices of colors to be equally divided among the entire population of toddlers. Hence, the expected frequencies would be 10 for each color.
Use the chisq.test()
function to perform a Goodnes-of-Fit or one-way Chi-Squared test to see if the observed counts are significantly different from being equally distributed.
NOTE: You do not need to declare any options inside the
chisq.test()
function, as the default is to use equally likely probabilities.
# Run the 1-way chi-square test for equally likely
chisq_toy_color <- c(red = 13,
blue = 9,
yellow = 15,
green = 3) %>%
chisq.test() # defaults to Equally likely
The following code chunk shows how to create and display a table of the observed and expected counts for any 1-way Chi-squated test.
# Request the observed and expected counts
rbind(Observed = chisq_toy_color$observed,
Expected = chisq_toy_color$expected) %>%
pander::pander()
red | blue | yellow | green | |
---|---|---|---|---|
Observed | 13 | 9 | 15 | 3 |
Expected | 10 | 10 | 10 | 10 |
To display the full output, type and run the name the model is save as.
# Diplay the full output
chisq_toy_color
Chi-squared test for given probabilities
data: .
X-squared = 8.4, df = 3, p-value = 0.03843
17.1.2 Observed counts vs. Hypothesised Probabilities
TEXTBOOK Example: Imagine that the population of a city is made up of three ethnic groups, which I will label A, B, and C. The obtained frequencies were 28, 18, and 2. You could test the null hypothesis that sample is representatve of a population proportions which is half group A and a third group B.
The chisq.test()
function may also be used to perform a Goodnes-of-Fit or one-way Chi-Squared test to see if the observed counts are significantly different from thoes expected from a set of hypothesised probabilies.
NOTE: You DO need to declare the probabilities, as the default is to use equally likely probabilities. You may do this by including
p = c(
\(p_1\),
\(p_2\), ...,
\(p_k\))
within thechisq.test()
function. The \(p_i\)’s maybe typed as decimals or fractions, but make suer they add up to exactly \(1\)!
# Run the 1-way chi-square test for hypothesised probabilityes
chisq_ethnic <- c(A = 28,
B = 18,
C = 2) %>%
chisq.test(p = c(1/2, 1/3, 1/6)) # declare the probabilities
Use the same code chunk to display a table of the observed and expected counts for any 1-way Chi-squated test.
HINT You may copy-and-paste this code for the rest of the assignment, but make sure to change the name of the model (
chisq_ethnic
appears twice before the $-sign).
# Request the observed and expected counts
rbind(Observed = chisq_ethnic$observed,
Expected = chisq_ethnic$expected) %>%
pander::pander()
A | B | C | |
---|---|---|---|
Observed | 28 | 18 | 2 |
Expected | 24 | 16 | 8 |
To display the full output, type and run the name the model is save as.
# Diplay the full output
chisq_ethnic
Chi-squared test for given probabilities
data: .
X-squared = 5.4167, df = 2, p-value = 0.06665
17.2 Test for Independence (2-way) - vs. Association
TEXTBOOK Example: Suppose that the researcher has interviewed 30 women who have been married: 10 whose parents were divorced and 20 whose parents were married. Half of the 30 women in this hypothetical study have gone through their own divorce; the other half are still married for the first time. To know whether the divorce of a person’s parents makes the person more likely to divorce, we need to see the breakdown in each category- that is, how many currently divorced women come from “broken” homes and how many do not, and similarly for those still married. These frequency data are generally presented in a contingency (or cross-classification) table:
The dataset needs to be declared a table before you can run a Chi-Squared Test
# Store the data as a table
woman_parents <- data.frame(home_broken = c(7, 3),
home_complete = c(8, 12),
row.names = c("self_divorced", "self_married")) %>%
as.matrix() %>%
as.table()
# Display the observed counts
woman_parents %>%
addmargins() %>%
pander::pander()
home_broken | home_complete | Sum | |
---|---|---|---|
self_divorced | 7 | 8 | 15 |
self_married | 3 | 12 | 15 |
Sum | 10 | 20 | 30 |
The chisq.test()
function may also be used to perform a two-way Chi-Squared test for independence. In this case, the observed counts are compared to thoes expected if there is no association between the two factors.
# Run the 2-way chi-square test for independence
chisq_divorces <- woman_parents %>%
chisq.test(correct = FALSE) #IF 2x2, add correct = FALSE
To display the counts expected if the variables are independent, start with the model name and add $expected
at the end. Then pipe on both the addmargins()
and pander::pander()
functions to print the counts.
# Request the expected counts based on "no association"
chisq_divorces$expected %>%
pander::pander()
home_broken | home_complete | |
---|---|---|
self_divorced | 5 | 10 |
self_married | 5 | 10 |
To display the full output, type and run the name the model is save as.
# Diplay the full output
chisq_divorces
Pearson's Chi-squared test
data: .
X-squared = 2.4, df = 1, p-value = 0.1213