Welcome to class!
Spiritual Thought
Announcements
- Code chunk options:
- Locally using #| warning: false
- Globally in the YAML using execute: warning: false
Flights Data Issues:
What are some of the data issues you discovered while getting to know your data?
Loading JSON files into pandas
Let’s load in some practice data! Data link.
Here’s a description of the data: Data Description.
import pandas as pd # to load and transform data
import numpy as np # for math/stat calculations
# from url to pandas dataframe
url = "https://github.com/byuidatascience/data4missing/raw/master/data-raw/mtcars_missing/mtcars_missing.json"
cars = pd.read_json(url)
# or from file to pandas dataframe
cars = pd.read_json("mtcars_missing.json")
Look at the data for the first two cars. What is different about the format?
[
{
"car": "Mazda RX4",
"mpg": 21,
"cyl": 6,
"disp": 160,
"hp": 110,
"drat": 3.9,
"wt": 2.62,
"qsec": 16.46,
"vs": 0,
"am": 1,
"gear": 4,
"carb": 4
},
{
"car": "Mazda RX4 Wag",
"mpg": 21,
"cyl": 6,
"disp": 160,
"hp": 110,
"drat": 3.9,
"wt": 2.875,
"qsec": 17.02,
"am": 1,
"gear": 4,
"carb": 4
}
]
Your Turn: Transforming Data
With your group, research these functions and create an example using the cars
data. Post your example in Slack. Be prepared to teach the class about your functions.
You can use the Data Transformation textbook chapter and the pandas documentation to help you.
Recreate the following output to the best of your abilities:
Group 1: Working with rows
.query()
allows you to subset observations (rows).sort_values()
arranges rows in a particular order
Group 2: Working with columns
.filter()
(as well as[]
and.loc[]
) allow you to select columns.assign()
is one way to add new columns to a dataframe
Group 3: Counting items
.value_counts()
summarizes a column by counting the values inside.crosstab()
creates a “cross tabulation” of two or more variables
Group 4: Summarizing data
- Using
.groupby()
and.agg()
together allows you to calculate group summaries
Your Turn: Summarizing the cars data
Write code to calculate the mean weight wt
for each cylinder type cyl
.
cars.groupby('cyl').agg(mean_weight = ('wt', np.mean)).reset_index()
Can you print the answer as a markdown table?
print(cars.groupby('cyl').agg(mean_weight = ('wt', np.mean)).reset_index().to_markdown(index = False))
Project 2 FAQs
One main reason:
You can create multiple columns within the same
assign()
where one of the columns depends on another one defined within the same assign. source: Documentation
Other resources:
- Why use pandas.assign rather than simply initialize new column?
- 3 Ways to Add New Columns to Pandas Dataframe
Not related, but also fun: Should you use “dot notation” or “bracket notation” with pandas?
Two ways to define the same function:
def square(x):
return x**2
square = lambda x:x**2
There are some difference between them as listed below.
- lambda is a keyword that returns a function object and does not create a ‘name’. Whereas def creates name in the local namespace
- lambda functions are good for situations where you want to minimize lines of code as you can create function in one line of python code. It is not possible using def
- lambda functions are somewhat less readable for most Python users.
- lambda functions can only be used once, unless assigned to a variable name.
What if you want to create a new column, whose values depend on another column? There are a lot of ways to accomplish this (see this stackoverflow answer). Some functions I use:
- isin() method
- where() method
- You can also use an if else statement inside a lambda function
[]
or .query()
.API’s and JSON: A Primer
Application Programming Interfaces (APIs)
Representational State Transfer (REST APIs)
Over the course of the ’00s, another Web services technology, called Representational State Transfer, or REST, began to overtake [all other tools] for the purpose of transferring data. One of the big advantages of programming using REST APIs is that you can use multiple data formats — not just XML, but JSON and HTML as well. As web developers came to prefer JSON over XML, so too did they come to favor REST over SOAP. As Kostyantyn Kharchenko put it on the Svitla blog, “In many ways, the success of REST is due to the JSON format because of its easy use on various platforms.”
Today, JSON is the de-facto standard for exchanging data between web and mobile clients and back-end services. ref
JavaScript Object Notation
Well, when you’re writing frontend code in Javascript, getting JSON data back makes it easier to load that data into an object tree and work with it. And JSON formats data in a more succinct way, which saves bandwidth and improves response times when sending messages back and forth to a server.
In a world of APIs, cloud computing, and ever-growing data, JSON has a big role to play in greasing the wheels of a modern, open web. ref
Other Resources
- RESTful APIs in 100 Seconds (video)
- Python API Tutorial: Getting Started with APIs
- Big List of Free and Open Public APIs (No Auth Needed)