Day 1: Introduction

My family

My Education and Employment History

  • 2003: Undergraduate in Economics (er Socialist History) from the U.
  • 2003-2005: Master’s degree in Statistics from BYU.
  • 2005-2015: Pacific Northwest National Laboratory
  • 2015-Current: BYU-I
  • 2015-Current: Data Driven Consulting (Child Health, Environmental Sampling, Business Consulting)

My interests

My learning manifesto

Will you take a minute and read my manifesto and then write one sentence, guessing what I expect from you in this class?

Father Sarduci on Education

Are we all on the Slack Channel?

byuidss.slack.com

What is Slack?

Think of Slack as a chatroom shared among all the members of an organization.

Your organization is known as a “workspace” and is divided into “channels,” which are separate group chats with their own members and topics.

In these channels, you can send messages, images, internet links, videos, and more. They’re designed to make communication between employees seamless, and replace many of the functions that email once dominated. ref

Slack with the BYU-I Data Science program

Just use it! Get the App (Apple, Android, Desktop) and keep it on.

Learn to ask questions. Someone will answer.

Slack Pinned Items

Welcome to CSE 250! Here are some key links for our course.

What is a data scientist?

A blend of programmer, statistician, and communicator that burns with curiosity

What is this article saying about the data science skills that are needed? Find the table and text in the section above the table

What is data science programming?

Data scientists write code as a means to an end, whereas software developers write code to build things. Data science is inherently different from software development in that data science is an analytic activity, whereas software development has much more in common with traditional engineering. Data scientists tackle problems such as identifying fraudulent transactions, or predicting which employees are likely to leave a company. Software developers can take the data scientists models and turn them into fully functioning systems with production-quality code. Software developers tackle problems like getting an algorithm to run more efficiently, or building user interfaces.

More importantly than title, is that if you hire a data scientist and expect them to be a software developer, you are wasting a lot of your time and money. If you hired a data scientist, put them to work identifying opportunities to use data science in your organization. reference

An overview of CSE 250 - Data Science Programming

Course Outcomes

Upon completing this course, you will be able to use data-driven programming in Python to handle, format, and visualize data. We will introduce you to data wrangling techniques (panadas), analytical methods (scikit-learn), and the grammar of graphics (Altair). Specifically, as a successful learner, you will be able to;

  1. Use functions, data structures, and other programming constructs efficiently to process and find meaning in data.
  2. Programmatically load data from various types of data sources, including files, databases, and remote services.
  3. Use data manipulation libraries to perform straightforward analysis, produce charts, and prepare data for machine learning algorithms.
  4. Use machine learning libraries to discover insights, make predictions, and interpret the success of these algorithms.
  5. Use industry-leading tools to collaborate and share your work.

Principles of DS teaching

The course follows these principles of teaching Data Science.

  • Organize the course around a set of diverse projects
  • Integrate computing into every aspect of the course
  • Teach abstraction, but minimize reliance on mathematical notation
  • Structure course activities to realistically mimic a data scientist’s experience
  • Demonstrate the importance of critical thinking/skepticism through examples

The reality of CSE 250

  1. We have done all we can to ensure that this is a 2-credit course for the average student. That means that we expect 4-5 hours outside of class for the average student to achieve an A. You have to put in the time if you want to build skills.
  2. The course is necessarily creative in nature. That fact usually makes it feel more challenging. We will be asking you to learn to write creative data science python code.
  3. If you have any concerns, please talk with me.

Ok, then what is the structure of CSE 250?

The class uses 6 projects to teach data science programming in Python using pandas, Altair, scikit-learn, and numpy (listed in order of use within the class).

How do I get the grade I want?

What do I need to have done by our next class period?

Have the Introduction Project completed.

You have your VS Code setup and ready for data science programming. Packages installed and you have recreated one Altair chart.