Welcome to class!
Announcements
Gratitude Journal
Validating visuals
You’re going to make a lot of bar charts!
- Simple bar chart tutorial.
- Make Altair do the counting for you! Tutorials here and here.
Getting started on Grand Question 3
One-hot encoding
Project 5 asks you to “one-hot encode all columns that have categories” and “convert all yes/no responses to 1/0 numeric”.
The get_dummies
method can be used to create one-hot encoded variables. The pd.get_dummies documentation is a great place to start.
After reading the documentation, study the code below and get started on Grand Question #3.
#%%
# When we use machine learning to predict salary,
# let's only look at people that have seen at least
# one star wars film
starwars = starwars.query('have_seen_any == "Yes"')
# Discuss - what's a better way to filter out people
# who haven't seen star wars?
# %%
# Format columns for machine learning
# Let's try this first: convert categories to "one-hot" encodings
shot_first_onehot = pd.get_dummies(starwars.shot_first)
shot_first_onehot
# What the difference between code above,
# and this? Which one is better?
shot_first_onehot = pd.get_dummies(starwars.shot_first, drop_first=True)
shot_first_onehot
# %%
# 'get_dummies()' can also be used to convert yes/no answers to 0/1
episode_i = pd.get_dummies(starwars.seen_film_i__the_phantom_menace)
episode_i
# %%
episode_i.value_counts()