Case Study 3: Becoming a databender

Background

You just started your internship at a big firm in New York, and your manager gave you an extensive file of flights that departed JFK, LGA, or EWR in 2013. From this data (nycflights13::flights), which you can obtain in R (install.packages("nycflights13"); library(nycflights13)), your manager wants you to answer the following questions;

  1. If I am leaving before noon, which two airlines do you recommend at each airport (JFK, LGA, EWR) that will have the lowest delay time at the 75th percentile?
  2. Which origin airport is best to minimize my chances of a late arrival when I am using Delta Airlines?
  3. Which destination airport is the worst (you decide on the metric for worst) airport for arrival time?

Reading

This reading will help you complete the tasks below.

Tasks

  • [ ] Address at least two of the three questions in the background description (if you have time try to tackle all three)
  • [ ] Make sure to include one or more visualization that shows the complexity of the data.
  • [ ] Create one .rmd file that has your report
    • [ ] Have a section for each question
    • [ ] Make sure your code is in the report but defaults to hidden
    • [ ] Write an introduction section that describes your results
    • [ ] make a plot of the data to show the answer to the specific question
  • [ ] Push your .Rmd, .md, and .html to your GitHub repo
  • [ ] Be prepared to discuss your analysis in the upcoming class
  • [ ] Complete the recommended reading on posting issues.
  • [ ] Find two other student’s compiled files in their repository and provide feedback using the issues feature in GitHub (If they already have three issues find a different student to critique)
  • [ ] Address 1-2 of the issues posted on your project and push the updates to GitHub

I made up databending. It does not mean that we make up data or that we alter it. Like airbenders we control our data to answer the questions we need answered. The key to databending is flexibility and finding and following the path of least resistence.