Frequently Asked Questions

Most likely, you have had 1-2 courses of programming before you have taken CSE 250. Unlike traditional computer science courses, CSE 250 uses Python in an interactive mode instead of building programs. The data provider usually has some big questions that need answering; However, there are hundreds of little issues and responses along the way. We use programming to facilitate this investigation.

There are similarities with User Experience Designers (UX). In our case, we don’t get to ask users about their experience. We use programming to ask data about its background, and each data set has its own history. We want our analysis to mold to that experience. You can think of data science programming like a first date with your data. You can’t write one long program nieve of the issues and nuances each living data set provides.

The two courses have similarities. You could think of CSE 250 as an introduction to data wrangling and visualization. Both classes use real-world data and are built around data science projects. There are some critical differences between the two courses.

  • In this course, we use Python, and CSE 350 uses R.
  • We are introducing the principles of data science programming in CSE 250.
  • The course is only 2-credits.
  • CSE 250 is intended to introduce visualization, wrangling, and modeling.
You will be comfortable with interactive programming and have an introduction to the principles of data formats for data science applications. You will be introduced to principles related to machine learning, data wrangling, and data visualization.
The course is done using Python. We focus on the pandas and Altair packages.
Using the new courses at BYU-I, the prerequisite is CSE 110. However, if you have experience programming from other classes, you most likely are prepared for this course.
The computer science and software engineering programs at BYU-I use Python as their foundational courses. The standard student will have some experience with Python before CSE 250. Python is an essential programming language for data scientists, and we already have CSE 350/Math 335, which is taught in R.
pandas is the foundational data science package in Python. If you are using tabular data you will be in pandas.
Matplotlib was the first visualization package to gain a following in Python. Seaborn is built on top of Matplotlib. Many data scientists use both in their work—neither leverage the grammar of graphics as developed by Leland Wilkinson. Altair is built on Vega-Lite, which uses the Vega visualization grammar. It is declarative and actively developed. We expect that it will become the predominant visualization package in Python (https://youtu.be/FytuB8nFHPQ and https://youtu.be/vTingdk_pVM).