Data Import

J. Hathaway

Becoming the Critic.

Visualization of the Day

Team Discussion

Case Study 4: Reducing Gun Deaths (FiveThirtyEight)

Case Study 5: I can clean your data

Task 9: Same Data Different Format

Being Readings

Structured Thinking

Structured thinking is a process of putting a framework to an unstructured problem. Having a structure not only helps an analyst understand the problem at a macro level, it also helps by identifying areas which require deeper understanding.

Structured Thinking (2)

How can these articles help you perform better in this class and your future work?

Hadley on Tidy Data

Comments from the Tidy paper?

“Happy families are all alike; every unhappy family is unhappy in its own way.”

– Leo Tolstoy

“Tidy datasets are all alike, but every messy dataset is messy in its own way.”

– Hadley Wickham

“There is one glory of the sun, and another glory of the moon, and another glory of the stars: for one star differeth from another star in glory.”

– Paul (1 Corinthians 15:41)

Really. How bad can it get?

Tidy Data and Analysis

Tidy

There are three interrelated rules which make a dataset tidy:

  1. Each variable must have its own column.
  2. Each observation must have its own row.
  3. Each value must have its own cell.

Tidy messy data

This data request was based on the question, “How are the R/Python DS courses affecting students?”

  1. Look at the data and write down your top 3-5 concerns about using this file for analysis.
  2. Diagram how this file will need to be changed to be tidy (and remove so much of the blank space)
  3. In ‘psuedo code’ write out the steps you will need to do to get to your final format.
  4. Review the tidyr documentation and find the functions that could help you with this task.

Importing Data

Board Activity

Write this chunk of code out in an English sentence to your grandma.

  • Now write out the code in piped format

What is tempfile() doing?

Run the following line and look at bob. What is it?

  • Why would we want to use a tempfile?

Note that I am trying to save you from storing large data files in your Git repository.

The data import packages

haven package

http://haven.tidyverse.org

readxl package

http://readxl.tidyverse.org

downloader package

Just a wrapper around download.file()

Reading Files

Describe in your task 9 readme what R is doing when you use a function like read.csv() or read_csv() without using the word read. Try to be succint.

When you are done push your file to GitHub.

What words did we use to describe the process?

What does parse mean?

Reading in ASCII data as text

Using read_lines() from library(readr)

Connecting to Databases

db.rstudio.com

Excel with Excel

Semester Project & Class

Structure your work