Details

Your coding challenge will help you demonstrate the skills you have developed this semester. Here are a few essential items.

  1. Your goal is to demonstrate your data science coding abilities. Get through as many items with a rough implementation as possible.
  2. Get your code to match our outputs as close as possible, but don’t stress over minute details.
  3. Keep most of the code you type. If you end up not using specific parts, comment them out and include them at the bottom.
  4. Use the entire hour and may not finish.
  5. Submit a .md and a .pdf report with your output and code for each challenge.

Please use the challenge template to submit your work.

import pandas as pd 
import altair as alt
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn import tree
from sklearn.ensemble import GradientBoostingClassifier
from sklearn import metrics

Challenge 1

Split Entry houses are a failed building experiment in the United States. Use the data from our Denver homes project, as shown below, to recreate the following graphic.


url = 'https://github.com/byuidatascience/data4dwellings/raw/master/data-raw/dwellings_denver/dwellings_denver.csv'
dat_home = pd.read_csv(url).sample(n=4500, random_state=15)

Challenge 2

Our computations can’t be done with missing values. Programmatically replace all the lost values with 125 and make a box-plot.

mister = pd.Series(["lost", 15, 22, 45, 31, "lost", 85, 38, 129, 80, 21, 2])

Challenge 3

Our computations can’t be done with missing values. Programmatically replace all the lost values with 125 and report the mean rounded to two decimals.

mister = pd.Series(["lost", 15, 22, 45, 31, "lost", 85, 38, 129, 80, 21, 2])

Challenge 4

Programmatically read in the following JSON file, keep only the cases column and return a markdown table that has country in the rows and cases for 1999 and 2000 in the columns. Your table will have six cells with values.

url = 'https://github.com/byuidatascience/data4python4ds/raw/master/data-raw/table1/table1.json'

Challenge 5

Use our cleaned example of the star wars data from project 6 to predict the gender of the respondent to the survey. Report your precision and a feature importance plot.

  1. Use test_size = .20 and random_state = 2020 in train_test_split()
  2. Use the GradientBoostingClassifier() method.
url = "http://byuistats.github.io/CSE250-Course/data/clean_starwars.csv"
dat = pd.read_csv(url)