Day 2: Python for DS Introduction

Finishing some setup

Reading for Comprehensions VS. Reading for Reference

  • How do we use the readings in this class?

Reading is the secret! Careful reading of material provided will bless you now and in the long run.

Read 500 pages like this every day. That’s how knowledge works. It builds up, like compound interest. All of you can do it, but I guarantee not many of you will do it.

-Warren Buffett

In an interactive chunk run the following.

import sys
!{sys.executable} -m pip install numpy pandas scikit-learn altair

Because VS Code fixed all the problems that I have historically had with pip. It just works now and is the method our students have been taught in CSE 110 and CSE 111.

Can we practice making a chart in Altair with VS Code?

Let’s review the power of Python Interactive

Setting up your script

A good data science .py script will have packages and data loaded at the top. Usually you have a few short commented sentences that descibe the script purpose.

# %%
# import pandas, altair, numpy
import pandas as pd
import altair as alt
import numpy as np

# %%
# load data
# handgrenade data https://github.com/byuidatascience/data4soils/blob/master/data-raw/cfbp_handgrenade/cfbp_handgrenade.csv

url = 'https://github.com/byuidatascience/data4soils/raw/master/data-raw/cfbp_handgrenade/cfbp_handgrenade.csv'

dat = pd.read_csv(url)

  1. Encode the row and column to the axes.
  2. Color the hmx points using the ‘goldorange’ color scheme.
  3. Use mark_square() and make the square sizes 500.
  1. Encode the x-axis as binned.
  2. Encode the y-axis as counts.
  3. Configure the title to a fontSize of 20.
  4. Use properties to place the title.