Finishing some setup
Reading for Comprehensions VS. Reading for Reference
- How do we use the readings in this class?
Reading is the secret! Careful reading of material provided will bless you now and in the long run.
- The idea of the Holy: 256 pages
- Journal Articles (I swear I have read this article almost 100 times)
- Optional References
Read 500 pages like this every day. That’s how knowledge works. It builds up, like compound interest. All of you can do it, but I guarantee not many of you will do it.
Each project has a Readings section. See Project 1 for example. Let’s look at the readings and figure out what we need.
In an interactive chunk run the following.
import sys
!{sys.executable} -m pip install numpy pandas scikit-learn altair
Because VS Code fixed all the problems that I have historically had with pip. It just works now and is the method our students have been taught in CSE 110 and CSE 111.
Can we practice making a chart in Altair with VS Code?
Let’s review the power of Python Interactive
# %%
in my.py
script is much better than Jupyter notebooks (.ipynb
).
- If we hope to have our code work in a production environment then Jupyter is problematic.
- Caching and code chunks are problematic
- https://medium.com/@_orcaman/jupyter-notebook-is-the-cancer-of-ml-engineering-70b98685ee71
Setting up your script
A good data science
.py
script will have packages and data loaded at the top. Usually you have a few short commented sentences that descibe the script purpose.
# %%
# import pandas, altair, numpy
import pandas as pd
import altair as alt
import numpy as np
# %%
# load data
# handgrenade data https://github.com/byuidatascience/data4soils/blob/master/data-raw/cfbp_handgrenade/cfbp_handgrenade.csv
url = 'https://github.com/byuidatascience/data4soils/raw/master/data-raw/cfbp_handgrenade/cfbp_handgrenade.csv'
dat = pd.read_csv(url)
alt.Chart(dat).encode()
- Encode the
row
andcolumn
to the axes. - Color the
hmx
points using the ‘goldorange’ color scheme. - Use
mark_square()
and make the square sizes 500.
- Encode the x-axis as binned.
- Encode the y-axis as counts.
- Configure the title to a
fontSize
of 20. - Use properties to place the title.