Day 1: Intro to Flights Data

Welcome to class!

Gratitude Journal

Announcements


Project 2: Late Flights and Missing Data

JSON files (JavaScript Object Notation)

Today, JSON is the de-facto standard for exchanging data between web and mobile clients and back-end services. source


Introduce the data

Load the JSON file and spend a few minutes studying it. Can you learn enough about it to describe the columns and rows?

Hints:

  • You can use .describe() to learn about the distribution of a numeric variable.
  • You can use .value_counts() to learn about the distribution of a categorical variable.
  • .crosstab() creates a “cross tabulation” of two or more categorical variables.

Can you trust the data?

Do you notice anything interesting about the flights data?


Question Brainstorming

In your group, try to answer the following questions about your assigned “Grand Question”:

  • What is our goal?
  • How can we get there?
  • What will the answer look like when we’re done?



Project 2 FAQs

Not all missing data is represented as np.nan. For an example, look at the column that counts delays due to late aircraft.

We will learn how to identify and deal with missing data next week. For now, we can drop rows we don’t want using square brackets [] or .query().

  • num_of_delays_weather
  • num_of_delays_late_aircraft
  • num_of_delays_nas