This example is taken from an experiment listed in the R help files under ?CO2
.
“An experiment on the cold tolerance of the grass species Echinochloa crus-galli was conducted. The CO2 uptake of six plants from Quebec and six plants from Mississippi was measured at several levels of ambient CO2 concentration. Half the plants of each type were chilled overnight before the experiment was conducted.” Plants were considered tolerant to the cold if they were still able to acheive high CO2 uptake values after being chilled.
knitr::kable(head(CO2))
Plant | Type | Treatment | conc | uptake |
---|---|---|---|---|
Qn1 | Quebec | nonchilled | 95 | 16.0 |
Qn1 | Quebec | nonchilled | 175 | 30.4 |
Qn1 | Quebec | nonchilled | 250 | 34.8 |
Qn1 | Quebec | nonchilled | 350 | 37.2 |
Qn1 | Quebec | nonchilled | 500 | 35.3 |
Qn1 | Quebec | nonchilled | 675 | 39.2 |
Ignoring the Plant ID for a moment, there are three factors that possibly effect the uptake
of a plant. These include Type
, Treatment
, and conc
. The factor Type
has two levels, Quebec and Mississippi. The factor Treatment
has two levels chilled and nonchilled and the factor conc
has seven levels 95, 175, 250, 350, 500, 675, and 1000. An ANOVA could be performed on this data using the model
\[ y_{ijkl} = \mu + \alpha_i + \beta_j + \gamma_k + \epsilon_{ijkl} \ \text{where} \ \epsilon_{ijkl}\sim N(0,\sigma) \]
This analysis would be performed in R by running the code
CO2.aov <- aov(uptake ~ Type + Treatment + as.factor(conc), data=CO2)
summary(CO2.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## Type 1 3366 3366 196.50 <2e-16 ***
## Treatment 1 988 988 57.69 7e-11 ***
## as.factor(conc) 6 4069 678 39.59 <2e-16 ***
## Residuals 75 1285 17
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Notice that in the summary output the factors of Type
, Treatment
, and conc
all show significant p-values. This claims that each of these factors significantly affect the uptake level. In other words, if you recall the hypotheses of ANOVA, the conclusion is that at least one level of each significant factor (in this case all factors are significant) has a different average value of uptake
from the other levels of that factor. We can demonstrate the results of the ANOVA test by showing the plots for each factor.
# Plot demonstrating Type:
xyplot( uptake ~ Type, data=CO2, main="", jitter.x=TRUE, pch=16)
# Plot demonstrating Treatment:
xyplot( uptake ~ Treatment, data=CO2, main="", jitter.x=TRUE, pch=16)
# Plot demonstrating conc:
xyplot( uptake ~ conc, data=CO2, main="", jitter.x=TRUE, pch=16)
Of course, these results are only meaningful if the ANOVA is appropriate. To assess the appropriateness of performing the current ANOVA on these data we need to check the residuals. This is done as shown below. The two assumptions are certainly questionable. The left plot shows that the constant variance may not be satisfied and the right plot shows that normality is uncertain. Official tests of these requirements confirm the difficulty of using ANOVA on these data by showing that at the 0.05 level, normality should not be assumed while the constant variance technically could be.
par(mfrow=c(1,2))
plot(CO2.aov, which=1:2)
# Test for constant variance:
library(lmtest)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
bptest(CO2.aov)
##
## studentized Breusch-Pagan test
##
## data: CO2.aov
## BP = 15.926, df = 8, p-value = 0.04345
# Test for normality
shapiro.test(CO2.aov$residuals)
##
## Shapiro-Wilk normality test
##
## data: CO2.aov$residuals
## W = 0.98025, p-value = 0.2247
This is certainly a frustration and needs to be resolved before conclusions about the data can be reached. A common reason that the assumptions of an ANOVA are not satisfied is that there are important interactions that have not been included in the ANOVA model. Consider how the plot (made previously but repeated here for convenience) of uptake ~ conc
shows the possibility of an interaction due to the separate groups of data (a low group and a high group).
# Repeated plot demonstrating conc:
xyplot( uptake ~ as.factor(conc), data=CO2, main="", jitter.x=TRUE, pch=16)
Thus, we could perform an ANOVA that expands the model to include all possible interactions between Type
, Treatment
, and Uptake
.
CO2int.aov <- aov(uptake ~ Type * Treatment * as.factor(conc), data=CO2)
summary(CO2int.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## Type 1 3366 3366 399.758 < 2e-16 ***
## Treatment 1 988 988 117.368 2.32e-15 ***
## as.factor(conc) 6 4069 678 80.548 < 2e-16 ***
## Type:Treatment 1 226 226 26.812 3.15e-06 ***
## Type:as.factor(conc) 6 374 62 7.412 7.24e-06 ***
## Treatment:as.factor(conc) 6 101 17 1.999 0.0811 .
## Type:Treatment:as.factor(conc) 6 112 19 2.216 0.0547 .
## Residuals 56 471 8
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
As shown in the output summary above, the interactions of Type:Treatment
and Type:conc
are significant. However, the interactions of Treatment:conc
and Type:Treatment:conc
are not significant. Notice that the residual plots still show some difficulties. There is still more to this story that will need to be resolved.
plot(CO2int.aov, which=1:2)