Case Studies (9)
Background
This culminating case study is like a “choose your own adventure” book. You get to choose the topic and the data! Your project should demonstrate the skills you learned in this class, including importing, wrangling, visualizing, and interpreting data. Specific requirements are listed in the Tasks section.
Make sure to read the Being Readings for advice on picking a project, and check out the Resources for ideas about where to find data.
Being Readings
The being readings for this case study are both from the book Build a Career in Data Science. If the links below don’t work for you, you can get an electronic version of the book from the BYU-Idaho library.
Optional:
- If you enjoy podcasts, you can also try listening to episode 4 of the author’s podcast which covers some (but not all!) of the material from Chapter 4.
- David Robinson has a fantastic blog post on the benefits of sharing your work in a public blog or GitHub repo.
Read the article(s) and come to class with two or three things to share. These could be a favorite quote, a question you had while reading, a thought or idea inspired by the reading, etc.
Resources
Finding Data: These links are a good place to look for data. If you’re interested in a specific industry or topic, you can also try googling something like “disc golf data” or “free example healthcare dataset”. Always make sure you have permission to use the data you find!
- Data is Plural is a newsletter that sends you interesting datasets every week. Scroll through the archive and see if any topics jump out at you!
- Tidy Tuesday is a community of R users that explore and visualize a new dataset every Tuesday. This GitHub account contains every dataset used in Tidy Tuesday. Click on the “data” folder and then pick a year to start exploring.
- This blog post lists many other websites you can use in your search for data.
RMarkdown Options: If you would like to change the appearance of your RMarkdown, you can learn how to change the theme here and see example themes here.
Tasks
-
Find a data set to analyze.
- Try to find something you are genuinely interested in. This will make it easier to pick a question or direction to focus on.
- The data you pick may not have enough information or may not be high-quality enough to serve your purposes. If you need to, look for a different data set. It is easier to switch at the now rather than half way through your project!
-
Wrangle, analyze, and visualize your data. You should demonstrate all of the skills you learned in unit 2 and 3, and must have at least one chart. (You are also welcome to use skills from later units, or from outside the course material.) Specifically, your case study will be checked for:
- Import: Load at least one dataset from a remote source.
- Wrangle:
- Basic Wrangle: Use at least two of the following -
select()
,filter()
,mutate()
,arrange()
,slice()
. - Grouped Wrangle: use at least two of the following -
group_by()
,summarize()
,count()
. - Combine and Reformat: Use at least two of the following -
*bind()
,*join()
,pivot*()
.
- Basic Wrangle: Use at least two of the following -
- Visualize:
- Geometry: Use
geom_*()
functions appropriately to communicate your data. - Aesthetics: Use
aes()
appropriately to map variables in your data to important aesthetics in your chart. - Theme: Demonstrate you know how to edit theme elements to enhance the message of the chart.
- Geometry: Use
- Interpret: Describe the insights gained from your wrangling and visualizations, and discuss how those insights might impact actions or decisions.
Compile your code, charts, and descriptions into an R Markdown report. Make sure your report is well organized and includes section headers to break up different topics. (Background, Conclusion, etc.)
Have a friend read through your project to make sure the analysis makes sense and everything is spelled correctly.
Present this case study to your class, following the directions from your instructor.