library(rio)
library(mosaic)
library(tidyverse)
library(car)
<- import('https://raw.githubusercontent.com/byuistats/Math221D_Cannon/master/Data/All_class_combined_personality_data.csv') big5
Summarizing Data
Multiple Groups
Introduction
In this document, we will demonstrate how to summarize quantitative data for multiple groups in a dataset.
Load the data and libraries
Summarizing a Quantitative Variable for Multiple Categories
Sometimes we would like to compare summary statistics between groups. Much of this class will be about how to make formal, rigorous comparisons between groups. But for now, let’s look at how to get different summaries of quantitative variables for multiple categories.
Summary Statistics
We can easily extend favstats()
to output our favorite statistics for multiple groups. We first must identify the quantitative factor we want to compare. For example, we could compare agreeableness between the sexes.
# This gives us the summary statistics for Agreeableness across all groups
favstats(big5$Agreeableness)
min Q1 median Q3 max mean sd n missing
21 67 75 81 100 73.43457 13.24909 405 0
# Adding the '~' tells R to break the data into groups (determined by the right side of the '~') and calculate the means of the variable on the left
favstats(big5$Agreeableness ~ big5$`Sex(M/F)`)
big5$`Sex(M/F)` min Q1 median Q3 max mean sd n missing
1 F 21 69 77 85 100 75.92035 12.94640 226 0
2 M 25 63 73 79 94 70.29609 12.99218 179 0
Visual Summaries by Group
We can use the exact same format as we used for favstats()
for boxplot()
:
boxplot(big5$Agreeableness ~ big5$`Sex(M/F)`)
Improving Graphs
Throughout this course, we will ease into making better visualizations. For now, here are some basic techniques that will usually apply to all graphing functions in R:
# Changing color by sepecifying the `col = c()`
boxplot(big5$Agreeableness ~ big5$`Sex(M/F)`, col = c("red", "blue"))
# R also assigns a numerical value to `col = `. Try different numbers
boxplot(big5$Agreeableness ~ big5$`Sex(M/F)`, col = c(2,3))
boxplot(big5$Agreeableness ~ big5$`Sex(M/F)`, col = c(4,6))
# Adding better axis labels using `xlab = ` and `ylab = `:
boxplot(big5$Agreeableness ~ big5$`Sex(M/F)`, xlab = "Biosex", ylab = "Trait Agreeableness")
# Adding a title:
boxplot(big5$Agreeableness ~ big5$`Sex(M/F)`, main = "Comparing Agreeableness by Biosex")
# Putting it all together:
boxplot(big5$Agreeableness ~ big5$`Sex(M/F)`,main = "Comparing Agreeableness by Biosex", xlab = "Biosex", ylab = "Trait Agreeableness", col = c(3, 4))
Your Turn
Create summary statistics for Conscientiousness based on course section:
Create side-by-side boxplot for Conscientiousness based on course section: