## Case Study 5: I can clean your data

### Background

The Scientific American argues that humans have been getting taller over the years. As the data scientists that we are becoming, we would like to find data that validates this concept. Our challenge is to show different male heights across the centuries.

This project is not as severe as the two quotes below, but it will give you a taste of pulling various data and file formats together into “tidy” data for visualization and analysis. You will not need to search for data as all the files are listed here

1. “Classroom data are like teddy bears and real data are like a grizzly bear with salmon blood dripping out its mouth.” - Jenny Bryan
2. “Up to 80% of data analysis is spent on the process of cleaning and preparing data” - Hadley Wickham

• [ ] Use the correct functions from library(haven) , library(readr), and library(readxl) to load the 6 data sets listed here
• [ ] Tidy the Worldwide estimates .xlsx file
• [ ] Make sure the file is in long format with year as a column. See here for an example of the final format.
• [ ] Use the separate() and mutate() functions to create a decade column.
• [ ] Import the other five datasets into R and combine them into one tidy dataset.
• [ ] This dataset should have the following columns - birth_year, height.cm, height.in, and study_id
• [ ] The BLS wage data does not have birth information. Let’s assume it is mid-twentieth century and use 1950.
• [ ] Make a plot with decade on the x-axis and height in inches on the y-axis with the points from Germany highlighted based on the data from the .xlsx file.
• [ ] Create an .Rmd file with 1-2 paragraphs summarizing your graphics and how those graphics answer the driving question
• [ ] Compile your .md and .html file into your git repository