Week 4-5: Project 2 - Flights

JSON files are the format of choice for sharing information and data between apps on the internet. When you hear someone explain that you can use an API to get the data, there is usually a JSON file involved. The history of JSON is worth reading. We will have another project analyzing data from JSON files that are missing values. Are we missing JSON on our flight?

The flights JSON File
and the Data Description

Grand Questions

  1. Which airport has the worst delays? How did you choose to define “worst”? As part of your answer include a table that lists the total number of flights, total number of delayed flights, proportion of delayed flights, and average delay time in hours, for each airport.

  2. What is the worst month to fly if you want to avoid delays? Include one chart to help support your answer, with the x-axis ordered by month. You also need to explain and justify how you chose to handle the missing Month data.

  3. According to the BTS website the Weather category only accounts for severe weather delays. Other “mild” weather delays are included as part of the NAS category and the Late-Arriving Aircraft category. Calculate the total number of flights delayed by weather (either severe or mild) using these two rules:

    1. 30% of all delayed flights in the Late-Arriving category are due to weather.
    2. From April to August, 40% of delayed flights in the NAS category are due to weather. The rest of the months, the proportion rises to 65%.
  4. Create a barplot showing the proportion of all flights that are delayed by weather at each airport. What do you learn from this graph (Careful to handle the missing Late Aircraft data correctly)?

  5. Fix all of the varied NA types in the data and save the file back out in the same format that was provided. Provide one example from the file with the new NA values shown.