Welcome to class!
Spiritual Thought
Short
Project 1 Comments
- Don’t include data as a table. Only include tables that add useful information. If I have to scroll up and down it isn’t useful.
- Reports should be readable by an intelligent, but non-technical audience (Meaningful titles and section names)
- Make it like something you’d like to read
- Clean out any code output, logs, that distract from the message (“My Useless Chart”)
- Eliminate “warnings”
Project 2: Late Flights and Missing Data
JSON files (JavaScript Object Notation)
Today, JSON is the de-facto standard for exchanging data between web and mobile clients and back-end services. source
What is JSON?
[
{
"car": "Mazda RX4",
"mpg": 21,
"cyl": 6,
"disp": 160,
"hp": 110,
"drat": 3.9,
"wt": 2.62,
"qsec": 16.46,
"vs": 0,
"am": 1,
"gear": 4,
"carb": 4
},
{
"car": "Mazda RX4 Wag",
"mpg": 21,
"cyl": 6,
"disp": 160,
"hp": 110,
"drat": 3.9,
"wt": 2.875,
"qsec": 17.02,
"am": 1,
"gear": 4,
"carb": 4
}
]
Introduce the data
Load the JSON file and spend a few minutes studying it. Can you learn enough about it to describe the columns and rows?
Hints:
- You can use
.describe()
to learn about the distribution of a numeric variable. - You can use
.value_counts()
to learn about the distribution of a categorical variable. .crosstab()
creates a “cross tabulation” of two or more categorical variables.
Can you trust the data?
Do you notice anything interesting about the flights data?
Question Brainstorming
In your group, try to answer the following questions about your assigned question:
- What is our goal?
- How can we get there?
- What will the answer look like when we’re done?
Project 2 FAQs
Not all missing data is represented as np.nan
. For an example, look at the column that counts delays due to late aircraft.
We will learn how to identify and deal with missing data next week. For now, we can drop rows we don’t want using square brackets []
or .query()
.
num_of_delays_weather
num_of_delays_late_aircraft
num_of_delays_nas