Day 21: May the ML columns be with you

Moving from categories to values.

  1. Create an additional column(s) that converts the income ranges to a number.
  2. Create an additional column(s) that converts the age ranges to a number.
  3. Create an additional column(s) that converts the school groupings to a number.

Why are we converting the columns to numerical values?

One-hot encoding

  1. One-hot encode all columns that have categories.
  2. Convert all yes/no responses to 1/0 numeric.

Which columns are going to be problematic for the function pd.get_dummies()?

What is pd.get_dummies() default behavior for the columns that are created? Should we change that behavior?