Day 1: Intro to Flights Data

Welcome to class!

Spiritual Thought

Short

Link

Project 1 Comments

  1. Don’t include data as a table. Only include tables that add useful information. If I have to scroll up and down it isn’t useful.
  2. Reports should be readable by an intelligent, but non-technical audience (Meaningful titles and section names)
  3. Make it like something you’d like to read
  4. Clean out any code output, logs, that distract from the message (“My Useless Chart”)
  5. Eliminate “warnings”

Project 2: Late Flights and Missing Data

JSON files (JavaScript Object Notation)

Today, JSON is the de-facto standard for exchanging data between web and mobile clients and back-end services. source


What is JSON?
[
  {
    "car": "Mazda RX4",
    "mpg": 21,
    "cyl": 6,
    "disp": 160,
    "hp": 110,
    "drat": 3.9,
    "wt": 2.62,
    "qsec": 16.46,
    "vs": 0,
    "am": 1,
    "gear": 4,
    "carb": 4
  },
  {
    "car": "Mazda RX4 Wag",
    "mpg": 21,
    "cyl": 6,
    "disp": 160,
    "hp": 110,
    "drat": 3.9,
    "wt": 2.875,
    "qsec": 17.02,
    "am": 1,
    "gear": 4,
    "carb": 4
  }
]

Introduce the data

Load the JSON file and spend a few minutes studying it. Can you learn enough about it to describe the columns and rows?

Hints:

  • You can use .describe() to learn about the distribution of a numeric variable.
  • You can use .value_counts() to learn about the distribution of a categorical variable.
  • .crosstab() creates a “cross tabulation” of two or more categorical variables.

Can you trust the data?

Do you notice anything interesting about the flights data?


Question Brainstorming

In your group, try to answer the following questions about your assigned question:

  • What is our goal?
  • How can we get there?
  • What will the answer look like when we’re done?



Project 2 FAQs

Not all missing data is represented as np.nan. For an example, look at the column that counts delays due to late aircraft.

We will learn how to identify and deal with missing data next week. For now, we can drop rows we don’t want using square brackets [] or .query().

  • num_of_delays_weather
  • num_of_delays_late_aircraft
  • num_of_delays_nas