Class Flow

Introduction (2 days)

As we move to the cloud for our computing and data storage we will take many of the tools we have learned and integrate them into a new experience. While Math 335, CSE 250, CSE 450, and Math 425 are all applicable, none of them are exactly the same as computing in the cloud. As we move forward, we will see the skills from those classes come together as we learn to leverage data in the cloud.

Day 1: Introduction

Day 2: GitHub Collaboration and Docker

Deciding on a cloud framework (2 days)

Day 1: Project need introduction

Day 2: Team development time and presentations

Exploring Docker as a Data Science Tool (4 days)

Day 1: Docker 101

Day 2: What is Docker?

Day 3: Docker for data science

Day 4: Finishing the Docker small project

Exploring nonprofits and 990 Forms

Day 1: What data do we have on Nonprofits?

Day 2: Describing the data?

Learning Spark (4 days)

Going deeper with Spark, SparkSQL, and SparkML (4 days)

Deciding on a data story (3 days)

Day 1: Exploring data

Day 2: Building a story and use case

Day 3: Pitching the data for the class project

Working in teams on cloud analytics projects (2 days)

Day 1: Deeper into GitHub and Git for remote work and collaboration

Day 2: Formalzing our data and joint work

Project Exploration and Development (4 days)

Days 1-3: Open programming, data exploration and use case development

Day 4: Mid-project presentations on ideas and class consesus on path forward.

Presentation of projects (2 days)