The hair and eye color as well as gender was recorded for 592 statistics students from the University of Deleware.
There are a few different questions that could be asked to these data. To show several examples of how a chi-squared test could be implemented, each of the following three questions will be answered with a separate chi-squared test. Any one of these questions would be sufficient for a single analysis.
Is hair color associated with eye color regardless of gender?
\[ H_{01}:\ \text{Hair color and eye color are not associated.} \] \[ H_{a1}:\ \text{Hair color and eye color are associated.} \]
Is hair color associated with gender?
\[ H_{02}:\ \text{Hair color and gender are not associated.} \] \[ H_{a2}:\ \text{Hair color and gender are associated.} \]
Is eye color associated with gender?
\[ H_{03}:\ \text{Eye color and gender are not associated.} \] \[ H_{a3}:\ \text{Eye color and gender are associated.} \]
# Test H_{01}:
HEC1 <- HairEyeColor[,,"Male"] + HairEyeColor[,,"Female"]
chi.HEC1 <- chisq.test(HEC1)
chi.HEC1
##
## Pearson's Chi-squared test
##
## data: HEC1
## X-squared = 138.3, df = 9, p-value < 2.2e-16
chi.HEC1$expected > 5
## Eye
## Hair Brown Blue Hazel Green
## Black TRUE TRUE TRUE TRUE
## Brown TRUE TRUE TRUE TRUE
## Red TRUE TRUE TRUE TRUE
## Blond TRUE TRUE TRUE TRUE
All expected counts are greater than 5, so the requirements are met. (If this failed, it will still be appropriate as long as all expected counts are at least 1 and the average expected count is at least 5.)
# Test H_{02}:
MH <- apply(HairEyeColor[,,"Male"],1,sum)
FH <- apply(HairEyeColor[,,"Female"],1,sum)
HEC2 <- cbind(MH,FH)
chi.HEC2 <- chisq.test(HEC2)
chi.HEC2
##
## Pearson's Chi-squared test
##
## data: HEC2
## X-squared = 7.994, df = 3, p-value = 0.04613
chi.HEC2$expected > 5
## MH FH
## Black TRUE TRUE
## Brown TRUE TRUE
## Red TRUE TRUE
## Blond TRUE TRUE
All expected counts are greater than 5, so the requirements are met.
# Test H_{03}:
ME <- apply(HairEyeColor[,,"Male"],2,sum)
FE <- apply(HairEyeColor[,,"Female"],2,sum)
HEC3 <- cbind(ME,FE)
chi.HEC3 <- chisq.test(HEC3)
chi.HEC3
##
## Pearson's Chi-squared test
##
## data: HEC3
## X-squared = 1.53, df = 3, p-value = 0.6754
chi.HEC3$expected > 5
## ME FE
## Brown TRUE TRUE
## Blue TRUE TRUE
## Hazel TRUE TRUE
## Green TRUE TRUE
All expected counts are greater than 5, so the requirements are met.
barplot(HEC1, beside=TRUE, legend.text=TRUE, xlab="Eye Color", main="Eye Color vs. Hair Color")
barplot(HEC2, beside=TRUE, legend.text=TRUE, xlab="Gender", main="Gender vs. Hair Color",
names.arg=c("Male","Female"))
barplot(HEC3, beside=TRUE, legend.text=TRUE, xlab="Gender", main="Gender vs. Eye Color",
names.arg=c("Male","Female"))
(See captions below each plot.)
If all three tests were actually performed simultaneously on the same data, then only the first test would be considered significant because each test would need to be tested at the \(\alpha=0.05/3 \approx 0.0167\) level to account for the multiplicity of tests. The three tests performed here were simply to give three different examples in a concise way.