Project 4: Classifying Homes

Background

The clean air act of 1970 was the beginning of the end for the use of asbestos in home building. By 1976, the U.S. Environmental Protection Agency (EPA) was given authority to restrict the use of asbestos in paint. Homes built during and before this period are known to have materials with asbestos. You can read more about the ban at this link.

The state of Colorado has a large portion of their residential dwelling data that is missing the year built and they would like you to build a predictive model that can classify if a house is built pre 1980.

Colorado gave you home sales data for the city of Denver from 2013 on which to train your model. They said all the column names should be descriptive enough for your modeling and that they would like you to use the latest machine learning methods.

Data

Download: dwellings_denver.csv, dwellings_ml.csv
Optional Download: dwellings_neighborhoods_ml.csv Information: Data description

Readings

Optional References

Grand Questions

  1. Create 2-3 charts that evaluate potential relationships between the house variables and the variable before1980 Explain what you learn from the charts that could help a machine learning algorithm.
  2. Build a classification model labeling houses as being built “before 1980” or “during or after 1980”. Your goal is to reach 90% accuracy. Explain your final model choice (algorithm, tuning parameters, etc) and describe what other models you tried.
  3. Justify your classification model by discussing the most important features selected by your model. This discussion should include a chart and a description of the features.
  4. Describe the quality of your classification model using 2-3 different evaluation metrics. You also need to explain how to interpret each of the evaluation metrics you use.

Deliverables

Use the provided template to submit your case study. The template has three sections:

  1. A short summary that describes the results of the project and the tools you used. (Think “elevator pitch”.)
  2. Answers to the grand questions. Each answer should include a written description of your results, and may also include charts or tables.
  3. An appendix that provides your commented code. Your code comments should justify any decisions you had to make while programming.