Introduction (2 days)
As we move to the cloud for our computing and data storage we will take many of the tools we have learned and integrate them into a new experience. While Math 335, CSE 250, CSE 450, and Math 425 are all applicable, none of them are exactly the same as computing in the cloud. As we move forward, we will see the skills from those classes come together as we learn to leverage data in the cloud.
Day 1: Introduction
Day 2: GitHub Collaboration and Docker
Deciding on a cloud framework (2 days)
Day 1: Project need introduction
- What is big data?
- Explain the V’s of big data (volume, velocity, variety, veracity, valence, and value).
- Explain how each of the V’s impacts data collection, monitoring, storage, analysis and reporting.
Day 2: Team development time and presentations
Day 1: Docker 101
Day 2: What is Docker?
Day 3: Docker for data science
Day 4: Finishing the Docker small project
Day 1: What data do we have on Nonprofits?
Day 2: Describing the data?
Learning Spark (4 days)
Going deeper with Spark, SparkSQL, and SparkML (4 days)
Deciding on a data story (3 days)
Day 1: Exploring data
Day 2: Building a story and use case
Day 3: Pitching the data for the class project
Working in teams on cloud analytics projects (2 days)
Day 1: Deeper into GitHub and Git for remote work and collaboration
Project Exploration and Development (4 days)
Days 1-3: Open programming, data exploration and use case development
Day 4: Mid-project presentations on ideas and class consesus on path forward.
Presentation of projects (2 days)