Finding data to answer your research questions is non-trivial. Except for your project, this class will shield you from this task. The large projects that data scientists work on can often require years to accumulate the necessary data to address our questions.
After finding the correct data to address the research question is where the 80/20 rule1 happens. Every fancy software and programming language that data analysts use has to come face to face with data digestion. The data sets below were found at the following three websites.
It looks like the University of Tubingen changes the download links at times. If a link is broken please post an issue.
The first file is under the Worldwide estimates of height by country and birth decade.
Three other files should be used for this case study from their website.
“Up to 80% of data analysis is spent on the process of cleaning and preparing data” - Hadley Wickham↩︎