Background / Purpose

Each of you will be responsible to create a data driven question, find the data to answer this question, and build a visual analysis that answers your question with data. A few notes on this project;

  1. This project is done over a semester. If you try to complete it during the last few weeks of the semester you will not succeed.
  2. The data science majors will submit this as a part of their degree completion. This project could be a great stepping stone for your senior project.
  3. I would highly recommend that you do this project well and make it public on your Github repository to demonstrate to employers that you have data programming skills.


The semester project has three different tasks that need to be completed in order to fulfill the task - question generation, data acquisition, and answer development.

Question Generation & Data Acquisition

  • Find 4-5 examples of data-driven answers and write a one-paragraph review of each.
    • List 2-3 items that are unique/good
    • Identify 1 issue with the each example

  • Develop a few novel questions that data can answer
    • Get feedback from 5-10 people on their interest in your questions and summarize this feedback
    • Find other examples of people addressing your question
    • Present your question to a data scientist to get feedback on the quality of the question and if it can be addressed in 2-months.

Answer Development

  • Review the “What do people do with new” data link above and write one quote that resonated with you in your .Rmd file.
  • Build an interactive document that has links to sources with a description of the quality of each
    • Find 3-5 potential data sources (that are free) and document some information about the source
    • Build an R script that reads in, formats, and visualizes the data using the principles of exploratory analysis
    • Write a short summary of the read in process and some coding secrets you learned
    • Include 2-3 quick visualizations that you used to check the quality of your data
    • Summarize the limitations of your final compiled data in addressing your original question

  • Finalize first draft of your project analysis
    • Choose your flavor of .Rmd for your presentation
    • Build a stand-alone analysis that helps a reader answer the question at hand with that available data
  • Present your visualization based analysis that addresses your question
    • Present your analysis to your roommates (or spouse) and update your presentation based on the feedback
    • Get feedback from 2-3 fellow classmates on your presentation and update it based on their feedback
    • Present your draft presentation to a data scientist to review for clarity
    • Present your work in class, at a society meeting, the research and creative works conference, or as a blog post online