library(rio)
library(mosaic)
library(tidyverse)
library(car)
<- import('https://raw.githubusercontent.com/byuistats/Math221D_Cannon/master/Data/All_class_combined_personality_data.csv') big5
Quantitative Data Summaries - Multiple Groups
Introduction
In this section, we will demonstrate how to numerically and visually summarize a quantitative response variable for each level of a categorical explanatory variable.
Lesson Outcomes
- Create a table of summary statistics (
favstats()
) for multiple groups
- Create side-by-side boxplots comparing multiple groups
- Interpret side-by-side boxplots for group comparisons
Load the data and libraries
We will use the Big 5 Personality data of a random sample of Brother Cannon’s students.
Summarizing a Quantitative Variable for Multiple Categories
Sometimes we would like to compare summary statistics between groups. Much of this class will be about how to make formal, rigorous comparisons between groups. But for now, let’s look at how to get different summaries of quantitative variables for multiple categories.
Summary Statistics
We can easily extend favstats()
to output our favorite statistics for multiple groups.
We first must identify the quantitative response variable we want to compare, then tell R which categorical explanatory variable we would like to compare.
For example, we could compare agreeableness between the sexes. In this case, Agreeableness
is the quantitative response variable and Sex(M/F)
is the categorical explanatory variable.
# This gives us the summary statistics for Agreeableness across all groups
favstats(big5$Agreeableness)
min Q1 median Q3 max mean sd n missing
21 67 75 81 100 73.43457 13.24909 405 0
# Adding the '~' tells R to break the data into groups (determined by the right side of the '~') and calculate the means of the variable on the left
favstats(big5$Agreeableness ~ big5$`Sex(M/F)`)
big5$`Sex(M/F)` min Q1 median Q3 max mean sd n missing
1 F 21 69 77 85 100 75.92035 12.94640 226 0
2 M 25 63 73 79 94 70.29609 12.99218 179 0
Visual Summaries by Group
We can use the exact same formula used for boxplot()
as we used for favstats()
:
boxplot(big5$Agreeableness ~ big5$`Sex(M/F)`)
NOTE: We will use the formula data$response ~ data$explanatory
for LOTS of functions this semester. They will always take the form y ~ x
.
Improving Graphs
Throughout this course, we will ease into making better visualizations. For now, here are some basic techniques that will usually apply to all graphing functions in R:
# Changing color by sepecifying the `col = c()`
boxplot(big5$Agreeableness ~ big5$`Sex(M/F)`, col = c("red", "blue"))
# R also assigns a numerical value to `col = `. Try different numbers
boxplot(big5$Agreeableness ~ big5$`Sex(M/F)`, col = c(2,3))
boxplot(big5$Agreeableness ~ big5$`Sex(M/F)`, col = c(4,6))
# Adding better axis labels using `xlab = ` and `ylab = `:
boxplot(big5$Agreeableness ~ big5$`Sex(M/F)`, xlab = "Biosex", ylab = "Trait Agreeableness")
# Adding a title:
boxplot(big5$Agreeableness ~ big5$`Sex(M/F)`, main = "Comparing Agreeableness by Biosex")
# Putting it all together:
boxplot(big5$Agreeableness ~ big5$`Sex(M/F)`,main = "Comparing Agreeableness by Biosex", xlab = "Biosex", ylab = "Trait Agreeableness", col = c(3, 4))
Your Turn
Create summary statistics for Conscientiousness based on course section:
Create side-by-side boxplot for Conscientiousness based on course section: