Questions about pd.get_dummies()
One-hot encoding
- One-hot encode all columns that have categories.
- Convert all yes/no responses to 1/0 numeric.
Which columns are going to be problematic for the function pd.get_dummies()
?
What is pd.get_dummies()
default behavior for the columns that are created? Should we change that behavior?
# examples of dropping one of the encoded variables.
pd.get_dummies(dat.star_wars_fans, drop_first=True)
pd.get_dummies(dat.shot_first).drop("I don't understand this question", axis = 1)
Open programming time
dat_example = pd.concat([
income_num,
education
], axis = 1)