Day 22: The 20% for ML

Questions about pd.get_dummies()

One-hot encoding

  1. One-hot encode all columns that have categories.
  2. Convert all yes/no responses to 1/0 numeric.

Which columns are going to be problematic for the function pd.get_dummies()?

What is pd.get_dummies() default behavior for the columns that are created? Should we change that behavior?

# examples of dropping one of the encoded variables.
pd.get_dummies(dat.star_wars_fans, drop_first=True)
pd.get_dummies(dat.shot_first).drop("I don't understand this question", axis = 1)

Open programming time

dat_example = pd.concat([
    income_num,
    education
], axis = 1)