Day 1: Intro to ML

Welcome to class!

Announcements

  1. Project 3 - Getting pickier about good communication
    • Tables in reports should be as concise as possible (no duplicate information)
    • Career batting average
    • Meaningful report name (Drop “Client Report”)
    • Meaningful section headers so the table of contents is useful (don’t call them “Question 1”)
    • Don’t include “My useless chart” from the template
  2. Ask for help!
    • Computing lab
    • Computing lab Slack channel (search)
    • Slack classmates or general channel

Spiritual Thought

Genesis 1:1 and Machine Learning
Are facts true?


Pictionary!



From Sebastian Thrun:

AI is able to learn ‘rules’ from highly repetitive data.


The single most important thing for AI to accomplish in the next ten years is to free us from the burden of repetitive work.


Your Turn: Student Classification Problem

Can we predict if a student is from Utah?


Your Turn: Features and Targets

Import dwellings.csv. With a neighbor:

  1. Try to describe the data. Explain what each observation (row) is and what measurements we have on that observation (columns).
  2. Now try describing the modeling (machine learning) we are going to do in terms of “features” and “targets”. Watch out - are there any columns that are the target in disguise? (You may need to review the project goal.)
  3. What features do you expect to have a strong relationship with the target?

Before Next Class

The goal of Question 1 is to help us with “feature selection”.

  • Remember: Overfitting happens when some boundaries are based on on distinctions that don’t make a difference.
  • More data does not always lead to better models. (Occam’s Razor)

Common questions: