Most people would sooner die than think, and most of them do.
-Bertrand Russell-

Overview

This course pulls many of the diverse capabilities developed throughout the other courses in the data science degree together to give students a complete skill set in and a better understanding of data programming. If you have signed up for this class, you are most likely driven by curiosity and interested in how data decisions are made (sometimes called data intuition). Possibly, you have a more empathetic approach to how the world works and how problems can be solved. Finally, you have an eye for visualization and how data is communicated to make impactful decisions.1

Upon completion of this course, you will be able to use data-driven programming in R for the handling, formatting, and visualization of messy and complex data. You will implement data wrangling techniques and the grammar of graphics process in visualizing complex data. Specifically, as a successful learner, you will be able to

  1. Convert data from varied formats or structures to a desirable format for analysis and visualization.
  2. Clean, transform and merge data attributes/variables appropriately.
  3. Effectively display and communicate meaning from spatial, temporal, and textual data.
  4. Articulate the process, benefits, and challenges of Big Data manipulation.
  5. Use current analysis, presentation, and collaboration tools in the data science field (R, Python, D3.js, GitHub).

The course follows these principles of teaching Data Science2

  • Organize the course around a set of diverse case studies
  • Integrate computing into every aspect of the course
  • Teach abstraction, but minimize reliance on mathematical notation
  • Structure course activities to realistically mimic a data scientist’s experience
  • Demonstrate the importance of critical thinking/skepticism through examples

See here for other great quotes about data science and learning. It would also be of value for you to read my learning manifesto

Competency Assumptions

This is a data coding class. The prerequisites for this course include introductory statistics and introductory programming in Python or R. As such we assume that you do know what a console does and how to execute scripts. We will not focus on traditional statistical hypothesis testing or complex statistical modeling, but we will leverage the concepts of how to visualize uncertainty and variability. A firm understanding of standard deviation and variance is needed. Here is a text review as well.

Course Materials

Online Books


Weekly Format

We will meet for 1.5 hours on two days a week. and use the following weekly rhythm. Material for each class is found on the class meeting page.

  1. Preparation
    A. Read Assigned being Materials
    B. Spend at least 1/2 hour on class tasks before coming to class
    C. Submit questions for class discussion on Slack.
  2. Class Time First Day of Week
    A. Case Study team meeting presentations from previous Week
    C. Being reading discussion D. Class activities
  3. Continue Preparation
    A. Read Assigned doing Material
    B. Make significant progress on class tasks
    C. Submit question for class discussion on Slack
  4. Class Time Second Day of Week
    A. Review of student questions from reading and daily class tasks
    B. Case Study Questions
    D. Second week class task programming time
  5. Case Study Completion
  6. Rinse and Repeat

Preparation

In my experience, getting lectured training outside of college is even more expensive than it is in college. A week’s worth of training can cost more than a semester of school here at BYUI.3 Due to this expense, learning how to digest material that is online and get up to speed on a topic before going to the expert for questions is a valuable skill to develop. I expect that you have completed the assigned reading material before class begins. You will also have coding tasks to complete after the reading and before class starts.

Class Time

As described in the weekly format, we will have shorter assignments due at the beginning of each period and weekly case studies. We will use class time to enforce the programming and visualization concepts needed for the weekly topic covered in the case study.

Semester Project

Each of you will be responsible to create a data driven question, find the data to answer this question, and build a visual analysis that answers your question with data. A few notes on this project;

  • This project is done over a semester. If you try to complete it during the last week of the semester you will not succeed.
  • The data science majors will submit this as a part of their degree completion. This project could be a great stepping stone for your senior project.
  • I would highly recommend that you do this project well and make it public on your Github repository to demonstrate to employers that you have data programming skills.

You can see the semester project task here.

Grading

Grading is a nasty side effect of mass learning and academia. We are in a class at a university and will have to manage this side effect. However, we don’t have to let it control our learning, thinking, or this class. Learning and thinking should motivate each activity.

Specifications Grading

As we team, teacher and student, we have a challenge to become more in three months! We have worked hard to identify the specifications needed for a data visualization specialist (as an undergraduate). My goal is to align your grade with the skill specification you have mastered. In other words, the grade you want will determine how much work you will do. Individual tasks in the class will not be traditionally graded. If your work meets the specified criteria you will get full credit and only then (there is no partial credit on tasks).

In a specifications-grading system all tasks are evaluated on a high-standards pass/fail basis using detailed checklists of task requirements and expectations4. Letter grades are earned by passing marks on a set of tasks. This system provides for a variety of choice and is closer to how learning, and work, is done in the real world. It will be easy for us to tell if work is complete, done in good faith, and consistent with the requirements.

We have five concepts that guide our learning objectives. Click on each to see an example task that we could complete this semester.

  1. Convert data from varied formats or structures to a desirable format for analysis and visualization.
  2. Clean, transform and merge data attributes/variables appropriately.
  3. Effectively display and communicate meaning from spatial, temporal, and textual data.
  4. Articulate the process, benefits, and challenges of Big Data manipulation.
  5. Use current analysis, presentation, and collaboration tools in the data science field (R, Python, D3.js, GitHub).

Semester Deliverables

  1. Completed LinkedIn, GitHub, Slack profiles that have been connected to our BYU-I data science community
  2. A cover letter stating the key concepts and techniques that you learned during our projects and your goals to continue learning in this area - include a grade request that represents your knowledge and task completion
  3. A resume that includes the skills you have learned during our projects
  4. A semester task form that records your completed tasks during the semester
  5. Semester project submission on GitHub
  6. Submit this material electronically.
  7. Bring a printed copy of your cover letter and resumedand to our exit interview

Competency Scale

Grade Class Tasks Case Studies On Time Semester Project
Leader
A 24 13 18/8/3 yes
A- 22 11 18/8/3 yes
B+ 20 9 18/8/3 yes
Supporter
B 12 6 7/4/2 no
B- 10 6 7/4/2 no
Listener
C 5 5 3/2/1 no
C- 5 4 3/2/1 no
Asleep
D 3 4 0/0/0 no
1 See the tasklist for a description of the ‘On Time’ column values

In all grade levels above a C-, case study 13 must be completed.

Notes

  • The definitive word is “complete”. Starting them or getting them almost done is not completing.
  • Those that are data science majors must complete a semester project to get anything higher than a B-.

Coding Challenge

We will have three in-class coding challenges (Task 16, Task 24, & last day of class) that you will need to complete. The coding challenge on the last day of the course is the one that can affect the grade earned above.

I may drop your grade up to three steps (e.g., from an A- to a B-) from your requested grade depending on your performance. I will report the grade quality of your first two challenges in I-learn. Generally, students sought grades are in line with the caliber of their coding challenge performance, and no change is made to their final grade.

Goals

After reviewing the material above, please make a list of learning goals you have based on this class. We would enjoy talking with you about those goals in the first few weeks of the semester. We look forward to working with you this semester.

Additional Disclaimers

Our class will uphold the following as well.


  1. https://medium.com/@nikhilbd/what-makes-a-good-data-scientist-engineer-a8b4d7948a86#.jr80wl98y

  2. https://arxiv.org/ftp/arxiv/papers/1612/1612.07140.pdf

  3. Additionally, textbooks often are not available when needed or too expensive to get your employer to purchase every time you want one.

  4. Making the right checklists can be difficult. Bad checklists could fall in the following categories – vague and imprecise; too long; hard to use; impractical; too pedantic. Good checklists are precise, efficient, easy to use and understand. This is the first time this course has been offered so we will have to work together to make sure requirements are good.