Exploratory Data Analysis

J. Hathaway

Becoming the Critic.

Visualization of the Day

Great Quotes

‘There are no routine statistical questions, only questionable statistical routines.’

— Sir David Cox

‘Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.’

— John Tukey

Review

Case Study 4: Reducing Gun Deaths (FiveThirtyEight)

Take 10 minutes to brainstorm with your table what the data inputs are and what visualizations you would like to create?

  • What mutations or summaries will you need to do?
  • What difficulties do you expect?
  • Do each of the task items make sense?

Task 8: World Data Investigations - Part 2

Socrative Hours Quiz

socrative.com

Your research questions

Task 3:

  • Share your research question with your neighbor and explain why finding an answer to the question with data would be exciting.
  • Then we can discuss a few as a class.

What is EDA?

Socrative Quiz

Exploratory Data Analysis

EDA is fundamentally a creative process. And like most creative processes, the key to asking quality questions is to generate a large quantity of questions.

  1. What type of variation occurs within my variables?
  1. What type of covariation occurs between my variables?

Understanding case_when()

case_when() is particularly useful inside mutate when you want to create a new variable that relies on a complex combination of existing variables. Write a short sentence that says what this code is doing?

Old Faithful

Exploring Old Faithful goals

  1. Make the histogram shown in the book with the black and white theme and an improved x-axis label.

Exploring Old Faithful (1)

Exploring Old Faithful goals (2)

  1. Make the histogram shown in the book with the black and white theme and an improved x-axis label.
  2. Use the mutate function to modify our plot to fill the histogram for two groups of waiting times.

Exploring Old Faithful (3)

Exploring Old Faithful goals (4)

  1. Make the histogram shown in the book with the black and white theme and an improved x-axis label.
  2. Use the mutate function to modify our plot to fill the histogram for two groups of waiting times.
  3. Use the waiting variable to make a hexbin plot of the relationship between waiting time and duration.

Exploring Old Faithful (5)