1 Getting Started
The tasks in this unit will help prepare you to do the following:
- Join the BYUIDSS Slack workspace, and begin communicating through Slack.
- Install and begin using both R and RStudio.
- Use both script(.R) and RMarkdown(.Rmd) files to execute commands in R, as well as knit .html and .md files.
- Install Git, join the byuistats workspace, and being sharing your work on GitHub.
The case studies, which appears at the end of this unit, give you an opportunity to demonstrate mastery of the objectives above.
Work through the individual prep tasks before class starts, so that you can contribute to the team discussion. The tasks for the first day of class are designed so that you can complete the individual prep portions (which you did not need to complete before class began), while getting to know your teammates.
1.1 Task: Communicate with Slack - Favorite Visualizations
We will communicate via Slack extensively throughout the semester and data science program. We’ll introduce ourselfs to your team while also practicing to communicate in the BYUIDSS Slack workspace.
-
Join the BYUI DSS Slack workspace by heading to https://byuidss.slack.com/signup.
- Use your
@byui.edu email
to create an account. - Create a professional username that you would be comfortable sharing with an employer.
- Install Slack as an app on your computer. (see https://slack.com/downloads/)
Extra features are available when you install Slack as an app on your PC and/or phone. The more comfortable you become with using Slack, the easier it will be to collaborate with each other throughout your professional career.
- Use your
-
Class discussions will happen in an appropriate Slack channel (such as
#ds350_s22_allen
or#ds350_s22_woodruff
).- Find the appropriate channel and add yourself to it.
- Introduce yourself in the “Introductions” thread. Click on the thread, hit reply, and provide a brief introduction.
Introduce yourselves to your team. Name, major, where you’re from, why you chose Data Wrangling and Visualization, etc., are all great starter questions. Please take 5-10 minutes to learn more about each other, as you’ll be working together for at least the next 7 weeks. For some fun ice breaker questions, head to https://teambuilding.com/blog/icebreaker-questions, pick a question or two you want to answer, and then get to know each other.
-
Pick a team name, and then have one person start a new thread in Slack that includes this team name. We’ll use this thread to practice using Slack in teams.
- In our class channel, find the thread for your team.
- Open your team thread, and start playing around with Slack’s features. Type something and post it (it doesn’t matter what).
- Use each of the features of the edit section, including HTML, CODE, and images.
- Type a multi-line comment with more than one paragraph (if you can’t quite do it, no worries, ask each other for help).
Do you have a favorite visualization(s)? Look for it online and then in Slack, upload an image and URL link.
-
Discuss the following questions with your team.
- How would you define “data wrangling”?
- How would you define “data visualization”?
- How would you define “data scientist”?
1.2 Task: R and RStudio
Both R and Python are crucial tools for work in data science. In this course we’ll focus on building skills in R. We’ll also gain some familiarity with using RStudio as a front end for coding in R. Today, let’s make sure everyone has both R and RStudio up and running, and can execute a few commands. For some, this class will be the first time they’ve used R, while others may have already had several courses. Let’s help each other all get up and running.
-
Get the lastest versions of R and R-Studio running on your computer.
- First install R at https://mirror.las.iastate.edu/CRAN/. Pick your platform and follow the instructions.
- If you already have R installed, verify that you have the most recent version. Make sure you have the most recent version.
- After R is installed, install RStudio at https://rstudio.com/products/rstudio/download/#download.
Note that RStudio runs R inside of it and provides you with many other tools that go beyond what R can do. This is why R must be installed first, so that RStudio can use it.
-
Fix your settings in R-Studio to use the code diagnostics. Feel free to place a check on every diagnostic for now, and then unselect those, throughout the semester, that you no longer find relevant.
- Help each other trouble shoot any issues that may have arisen during installing R and RStudio, as well as updating the code diagnostics.
-
Download the R script first-R-script.R and open it in RStudio.
Run each line of code, one at a time (figure out the keyboard shortcut to do this). It is not crucial that you understand what each line of code does (at this point in the semester). However, briefly try guessing what each line of code does before running it, but don’t worry if you don’t understand all (or any of) the code.
Discuss any questions you may have related to the code in the script file.
Answer the question, “Which of the variables (in the mtcars data set) are the biggest contributing factors to a lower mpg (better gas efficiency)?” Construct several visualizations that you feel would best help convince others of your answer. Share your visualizations and code with each other (Slack is a great place to share code).
1.3 Task: R Packages for Visualization
When we installed R, we installed just the beginning of the tools that an open source community has built for analyzing data. In this task, we’ll start by installing tidyverse
(which may take a few minutes to fully install). Then we’ll run some coding examples that illustrate the basic data workflow of load, wrangle, then visualize. In addition to installing packages, one of the main goals in this task is to make sure each member of your team can knit an R Markdown (.Rmd) file.
It is not crucial that you understand what each line of code does (at this point in the semester).
-
Download the zip file first-rmarkdown.zip, and unzip it. Then open the file
first-rmarkdown.Rmd
in RStudio.-
Once you open the file, you may see messages that you are missing some packages needed. Feel free to click the messages that ask you to install the packages, or install them manually by running the following lines once in your R console.
install.packages("tidyverse") install.packages("leaflet")
Once you have all the packages installed, click on Knit to create an
.html
document from this.Rmd
file. You may need to install some other packages as well, the first time you click Knit. An.html
document should appear.If anyone on your team hit a snag in loading packages and or knitting, do your best to trouble shoot the issue together. Don’t hesitate to call your instructor over if needed.
-
Go through the
.Rmd
file and evaluate each code chunk, one by one, to get used to how .Rmd files work in RStudio.-
Modify the code chunks in the visualization section to address the question, “Which state(s) had the most explosive outbreak of COVID, and when did it occur?”
- With your team, share the visualizations you created. Make a decision, backed by a visualization, that compares the outbreaks in several key states.
- What variables, if any, would help make a more informed comparision?
Play around with the leaflet package together. I strongly suggest copy/pasting the examples from https://rstudio.github.io/leaflet/, and then try changing portions. See if you can create a map that shows where each member of your team currently lives, or where your favorite restaurants are, or whatever.
1.4 Task: Let’s Git going
Git is a program that helps you organize code and keep track of any changes you make. Git uses folders called “repositories” to organize code into different projects. GitHub is an online platform where you can store your Git repositories, and then access them from any computer. You will use Git and GitHub this semester to submit your homework, in a repository made specifically for this class.
Download and install Git from the official website.
-
Go to GitHub and create an account.
- Use a professional username that you would be comfortable sharing with an employer.
-
Once you are logged into GitHub, try making your own repository by clicking the plus sign (+) in the top-right corner of your screen.
- Use any name and description you want. You can delete this repository later.
- Check the “Add a README file” option before you create your repository.
- Now click “Add file” (next to the green “Code” button) and try uploading a file to this new repository.
-
Now that you have practice making a repository, it is time to make one for this class. This will be done automatically for you, because we want your class repository saved in a special organization.
- Go to this GoogleSheets file and fill out your information. After roughly 10 minutes, you should receive two emails.
- One email will be an invitation from “GitHub” to join the BYUI335 organization. Accept the invitation.
- The other email will be an invitation from “Katie Allen” to collaborate on a repository with your name on it. Accept the invitation.
- If you do not see the two emails, check your spam folder.
-
You now have a repository where you can upload all of your homework! In the future you will upload daily work using RStudio. For now, let’s try manually uploading a file.
- Navigate to your class repository that you were invited to collaborate on (search for it here).
- Just like in Step 3, click the “Add file” and “Upload files” buttons. Try uploading a file (any file!).
-
Follow the steps in this video to get RStudio working with Git and GitHub. This will allow you to add files and makes edits to your class repository from within RStudio.
- Here is a summary of the key commands from that video. Run these install lines once, and then load the libraries.
install.packages("usethis") install.packages("gitcreds") library(usethis) library(gitcreds)
- Modify your git config page so that it contains your username and email associate with GitHub.
[user] name = yourusername email = youremail@something.abc
- Create a GitHub token, and set an expiration date (at least 90 days for the entire semester). You can leave the other default options.
- You’ll get a token in the previous step that you paste in running the next line.
gitcreds_set() #Paste token in when prompted.
After watching the video and following the steps above, clone your class repository (you can find it here) into RStudio. Try making an edit to the
README.md
file and “pushing” the change online.When you edit a file on your computer, you can save a history of your changes using the Git commands
add
andcommit
. To get those changes to show up online, you have topush
. If instead you make an edit to your online files, and then want those changes to show up on your local computer, you need go the opposite direction andpull
. As the semester goes on, you will fall into the habit of regularly runningpull
,stage
/add
,commit
, andpush
.-
In RStudio, edit your
README.md
file to include a short description of yourself. (For Example, “Hi! My Name is Whitney and I like…”) Thenpush
your edits, and make sure you can see the description in your online class repo.Now edit your
README.md
file from online. Below your description, add links to the repos of everyone in your group (you may need to finish this part in class).Go back to RStudio. Notice that the repo links do not appear in your
README.md
.Using the Git tab in RStudio,
pull
the online changes you made.Make one more change to
README.md
by adding the name of your group.push
the change online.
1.5 Task: Up and Running with R
This task is intended to get all of us on the same page in R. As you work through the task, the readings below can help answer questions and provide direction.
- Chapter 4: R for Data Scientists - Workflow Basics
- Modern Drive: Chapter 1 Getting Started with Data in R
Please start by skim reading each document, to get a feel for what topics are addressed.
Download the zip file up-and-running-with-r.zip. Extract the
.Rmd
file and save it in your class repository in the appropriate week folder. Then open the.Rmd
file in RStudio. (Note: We store.Rmd
files in.zip
files because of a GitHub issue that removes the YAML from the top of an.Rmd
file via direct downloading.)-
Spend an hour working on answering the 12 questions, taking notes of the readings, etc.
- Remember your goal is to become more familiar with R, so it’s perfectly fine if you get stuck on some of the questions. Slack is a great place to ask questions as we’re learning together, so please communicate there. Come to class ready to share what you’ve learned and ask for help on the things you are still stuck on.
-
The last step of every prep task will be to share your work with the class in your github class repository.
- Knit your document, which should produce a
.html
file, along with a.md
file. The.md
file allows quick previewing of your work on github.com. - Stage and commit all your work (
.Rmd
,.html
, and.md
files). - Push your committed work to your class repository on github.com (the green up arrow).
- If you were unable to get RStudio and Git to communicate, then manually upload all your work to your online class repository using the “Add File” button on github.com, as well as reach out on Slack with questions.
- Knit your document, which should produce a
-
Head to GitHub.com and locate your repository (you search for it here).
- Verify that your
.md
file appears, with your work and solutions properly displayed on GitHub. - Locate the repo of someone else in the course, and verify that their
.md
file appropriate shows their work. If something seems amiss, you can use the “Issues” tab in GitHub to help each other know something needs addressing.
- Verify that your
1.6 Task: The Mock Case Study
Case studies are a chance for us to ponder and prove (the final step of the BYUI learning model). Each case study will have an accompanying task, such as this one, which will include a “Being” reading assignment that we’ll have a discussion about during class.
One goal of this mock case study is to familiarize each of us with the case study process. The skills you’ll get from this case study are creating your own R Markdown file (.Rmd
), appropriately updating the YAML and knit options, and a verify that you’ve got Git working as needed for case study submission in I-Learn throughout the semester.
Complete the Being reading section of the Mock Case Study. This will require that you read an article, and come to class with two or three things to share.
Complete the Tasks in the case study. If you got stuck on any of the tasks, please ask questions in Slack.
During class, make sure you can locate each team member’s .md file on Github in their class repository. Help each other get to this stage. For the rest of the semester, we’ll utilize github to share our preparation with each other. Every task will involve some kind of item that needs to be pushed to your class repository.
In I-Learn, submit the Mock Case study assignment.