Background

This example is taken from an experiment listed in the R help files under ?CO2.

“An experiment on the cold tolerance of the grass species Echinochloa crus-galli was conducted. The CO2 uptake of six plants from Quebec and six plants from Mississippi was measured at several levels of ambient CO2 concentration. Half the plants of each type were chilled overnight before the experiment was conducted.” Plants were considered tolerant to the cold if they were still able to acheive high CO2 uptake values after being chilled.

knitr::kable(head(CO2))
Plant Type Treatment conc uptake
Qn1 Quebec nonchilled 95 16.0
Qn1 Quebec nonchilled 175 30.4
Qn1 Quebec nonchilled 250 34.8
Qn1 Quebec nonchilled 350 37.2
Qn1 Quebec nonchilled 500 35.3
Qn1 Quebec nonchilled 675 39.2

Ignoring the Plant ID for a moment, there are three factors that possibly effect the uptake of a plant. These include Type, Treatment, and conc. The factor Type has two levels, Quebec and Mississippi. The factor Treatment has two levels chilled and nonchilled and the factor conc has seven levels 95, 175, 250, 350, 500, 675, and 1000. An ANOVA could be performed on this data using the model

\[ y_{ijkl} = \mu + \alpha_i + \beta_j + \gamma_k + \epsilon_{ijkl} \ \text{where} \ \epsilon_{ijkl}\sim N(0,\sigma) \]

This analysis would be performed in R by running the code

CO2.aov <- aov(uptake ~ Type + Treatment + as.factor(conc), data=CO2)
summary(CO2.aov)
##                 Df Sum Sq Mean Sq F value Pr(>F)    
## Type             1   3366    3366  196.50 <2e-16 ***
## Treatment        1    988     988   57.69  7e-11 ***
## as.factor(conc)  6   4069     678   39.59 <2e-16 ***
## Residuals       75   1285      17                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Notice that in the summary output the factors of Type, Treatment, and conc all show significant p-values. This claims that each of these factors significantly affect the uptake level. In other words, if you recall the hypotheses of ANOVA, the conclusion is that at least one level of each significant factor (in this case all factors are significant) has a different average value of uptake from the other levels of that factor. We can demonstrate the results of the ANOVA test by showing the plots for each factor.

# Plot demonstrating Type:
xyplot( uptake ~ Type, data=CO2, main="", jitter.x=TRUE, pch=16)

# Plot demonstrating Treatment:
xyplot( uptake ~ Treatment, data=CO2, main="", jitter.x=TRUE, pch=16)

# Plot demonstrating conc:
xyplot( uptake ~ conc, data=CO2, main="", jitter.x=TRUE, pch=16)

Of course, these results are only meaningful if the ANOVA is appropriate. To assess the appropriateness of performing the current ANOVA on these data we need to check the residuals. This is done as shown below. The two assumptions are certainly questionable. The left plot shows that the constant variance may not be satisfied and the right plot shows that normality is uncertain. Official tests of these requirements confirm the difficulty of using ANOVA on these data by showing that at the 0.05 level, normality should not be assumed while the constant variance technically could be.

par(mfrow=c(1,2))
plot(CO2.aov, which=1:2)

# Test for constant variance:
library(lmtest)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
bptest(CO2.aov)
## 
##  studentized Breusch-Pagan test
## 
## data:  CO2.aov
## BP = 15.926, df = 8, p-value = 0.04345
# Test for normality
shapiro.test(CO2.aov$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  CO2.aov$residuals
## W = 0.98025, p-value = 0.2247

This is certainly a frustration and needs to be resolved before conclusions about the data can be reached. A common reason that the assumptions of an ANOVA are not satisfied is that there are important interactions that have not been included in the ANOVA model. Consider how the plot (made previously but repeated here for convenience) of uptake ~ conc shows the possibility of an interaction due to the separate groups of data (a low group and a high group).

# Repeated plot demonstrating conc:
xyplot( uptake ~ as.factor(conc), data=CO2, main="", jitter.x=TRUE, pch=16)

Thus, we could perform an ANOVA that expands the model to include all possible interactions between Type, Treatment, and Uptake.

CO2int.aov <- aov(uptake ~ Type * Treatment * as.factor(conc), data=CO2)
summary(CO2int.aov)
##                                Df Sum Sq Mean Sq F value   Pr(>F)    
## Type                            1   3366    3366 399.758  < 2e-16 ***
## Treatment                       1    988     988 117.368 2.32e-15 ***
## as.factor(conc)                 6   4069     678  80.548  < 2e-16 ***
## Type:Treatment                  1    226     226  26.812 3.15e-06 ***
## Type:as.factor(conc)            6    374      62   7.412 7.24e-06 ***
## Treatment:as.factor(conc)       6    101      17   1.999   0.0811 .  
## Type:Treatment:as.factor(conc)  6    112      19   2.216   0.0547 .  
## Residuals                      56    471       8                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As shown in the output summary above, the interactions of Type:Treatment and Type:conc are significant. However, the interactions of Treatment:conc and Type:Treatment:conc are not significant. Notice that the residual plots still show some difficulties. There is still more to this story that will need to be resolved.

plot(CO2int.aov, which=1:2)