By the end of the semester, each student will be able to:
- Integrate and extend previously learned data science tools to analyze remote and distributed data in business contexts.
- Explore, interpret, conceptualize, and validate assumptions of data at scale.
- Understand the differences and benefits of current industry technologies for big data storage and analysis.
- Leverage parallel processing for analysis.
The course follows these principles of teaching Data Science;1
- Organize the course around a set of diverse case studies
- Integrate computing into every aspect of the course
- Teach abstraction, but minimize reliance on mathematical notation
- Structure course activities to realistically mimic a data scientist’s experience
- Demonstrate the importance of critical thinking/skepticism through examples
You will find value in reading my learning manifesto.
We assume that you have experience using data science programming in Python as practiced in CSE 250. You will also need a background in data science programming in R as practiced in CSE 350 / Math 335 or experience with Machine Learning as practiced in CSE 450. You can see all the prerequisites at the BYU-I Catalog
This course assumes that you are capable of guided learning and working in teams.
In my experience, getting lectured training outside of college is even more expensive than it is in college. A week’s worth of training can cost more than a semester of school here at BYUI. I expect that you have completed the assigned reading material before class begins.
There will be a few coding challenges that pop-up at the beginning of class during the semester to make sure you are keeping up with the material.
The goal is to avoid traditional lectures in class. We will use class time for the following team activities.
These presentations are not expected to be high impact proposals with highly polished slides. However, they should be organized and clear as your slides will persuade the class to move with your group’s decision.
Each partner group will provide one 40-60 minute training on the class selected learning topics. These presentations should have a hands-on coding activity and be self-contained in a GitHub repo within our CSE451 GitHub organization.
The grading system’s influence on our thinking is a side effect of mass learning and academia. We are in a class at an accredited university and will have to manage this side effect. However, we don’t have to let it control our learning, thinking, or work. Discovering and practicing industry pertinent skills should motivate each activity.
The class performance is tracked in four areas - impact, involvement, hours, and understanding. These areas generally map to how you will be valued at your future employer. Each area is essential to maximize your perceived performance, but all areas do not need to be exceptional to earn the highest marks in this course or to survive in industry.
If your team doesn’t understand why they need your services, they will eventually not need you.
Concept: Your team will make decisions and assignments. Make sure your team feels like you are an equal contributor. Contributors are measured by taking responsibilities and delivering on those responsibilities. It is ok to contribute more on some projects and less on others, but your team should feel like you typically make significant contributions.
Class: We will have multiple elements of our big project and 5-6 small projects where you will get the opportunity to be the primary contributor. A primary contributor is defined by providing more material and results for the project than 50% of the group members.
If your team and manager don’t see and hear your ideas and work, they will question your leadership and interest.
Concept: Do your work before the meetings and come prepared to listen and direct the planning. Team meetings are not a time to stay quiet in an effort to be polite or avoid looking dumb. Get involved, ask questions, and provide answers.
Class: This element is harder to explain specifics on what should be done. I will reach out to you directly if you are not meeting expectations in your group involvement.
Putting in the time is the best predictor of success
Concept: Most employers expect you to work many hours each week. If they only wanted clearly specified products, they would hire consultants to deliver the product. As a full-time employee, you will be given the space to figure out new domains and then guide the group on their implementation. But you have to guide your work. As a data scientist, each day will have new unique challenges.
Class: Full-time employment for a 3-credit class at BYU-I is 9 hours a week (6 outside & 3 class). Putting in full hours all semester will be a crucial element in defining your final grade. Excellent performance in the other three areas could help you achieve the highest marks without meeting the full hours (But, generally you will need to put in hours to do well on the other three).
You should know how to do things. But not everything.
Concept: When you are on a team, you should earn a reputation for knowledge in a few specific areas. You want to be the person that everyone knows they can ask to get the right answer. Find your niche and hone your skills. You should find moments to offer your help in these areas.
Class: We will have coding challenges during class. Some will take the entire period, and others will be short concepts before we start class. All challenges will be announced at least 24 hours before the class period they occur, along with a programming topic. Also, you are responsible for helping your team members debug their code during the projects.
The below tables summarizes the specifications based grading for the course. You should read the details below for additional understanding.
Grade | Hours | Understanding | Involvement | Impact |
---|---|---|---|---|
A | 107 | 3 & 4 | < 3 warnings & < 3.1 hours class missing | Active all & key > 2 |
B | 98 | 3 & 3 | < 9.1 hours class missing or write-up | Active most & key > 1 |
C | 75 | 3 anytime | < 4 warnings | Active often & key > 0 |
D | 50 | – | – | – |
A Details:
B Details:
C Details:
D Details:
The coding challenges will be graded on a four-point scale - 1) Submitted work, 2) Some code aligns with the challenge, 3) Strong performance with satisfactory code, 4) Near flawless performance with clean and concise code.
If you feel you have greatly exceeded one of the competency areas, you can use that excess to negotiate a short coming in a different competency. Here are a few examples you could argue (These are example arguments and are not intended to signify a path to the grade requested).
I only got a satistfactory score on my final coding challenge, but I completed 119 hours and was a key contributor on 5 projects. As such I request an A.
I was only recognized as a key contributor on one project. However, I worked 107 hours and staid involved in all work during class. As such, I request a B.
I only worked 50 hours in this class. However, I got all 3s on my coding challenges and a 4 on the final coding challenge. In addition, I was a key contributor on 5 projects and never missed class. I request an A-.