JSON files are the format of choice for sharing information and data between apps on the internet. When you hear someone explain that you can use an API to get the data, there is usually a JSON file involved. The history of JSON is worth reading. We will have another project analyzing data from JSON files that are missing values. Are we missing JSON on our flight?
Completed Readings: P4DS: Chapter 5 Data tranformation, P4DS: Section 7.4 Missing Values, Python Data Science Handbook: Missing Data, How to Handle Missing Data, and Wikipedia Missing Data
The flights JSON File
and the Data Description
Grand Questions
Which airport has the worst delays? How did you choose to define “worst”? As part of your answer include a table that lists the total number of flights, total number of delayed flights, proportion of delayed flights, and average delay time in hours, for each airport.
What is the worst month to fly if you want to avoid delays? Include one chart to help support your answer, with the x-axis ordered by month. You also need to explain and justify how you chose to handle the missing
Month
data.According to the BTS website the Weather category only accounts for severe weather delays. Other “mild” weather delays are included as part of the NAS category and the Late-Arriving Aircraft category. Calculate the total number of flights delayed by weather (either severe or mild) using these two rules:
- 30% of all delayed flights in the Late-Arriving category are due to weather.
- From April to August, 40% of delayed flights in the NAS category are due to weather. The rest of the months, the proportion rises to 65%.
Create a barplot showing the proportion of all flights that are delayed by weather at each airport. What do you learn from this graph (Careful to handle the missing
Late Aircraft
data correctly)?Fix all of the varied
NA
types in the data and save the file back out in the same format that was provided. Provide one example from the file with the newNA
values shown.