library(tidyverse) # loads ggplot2, tibble, tidyr, readr, purrr, dplyr
library(readxl)
R/R-Studio Help
Introduction
R-Studio and Rmarkdown First Use: After you have installed R and R-Studio then this page will help you with building your first .Rmd file. We will use this type of file heavily throughout the semester. For those of you less familiar with R use the information below to get up to speed.
Additional links
- R Cheat Sheets: This page has links to one-page guides to different R packages and data science tools.
- Google’s R Style Guide: A guide from Google on how to type clean code that allows your code to communicate with others. Think Elements of Style.
- R Visualization: Some random links about R packages for visualization.
R language background
Beyond being FREE and the predominate statistical software used in science, industry, research, and business, it has great pedagogical advantages. Daniel Kaplan summarized a great learning/teaching advantage of using R in the following quote.
In mathematics and statistics, the output of one computation often becomes the input to another computation. That’s why math courses spend so much time talking about functions (and “domain” and “range”, etc.). In word processing, whenever you highlight a word and move it or change the font or replace it, you still end up with stuff on which you can perform the same operations: highlighting, moving, font-changing, etc. Not so in math and statistics. The sorts of operations that you will often perform - solving, integration, statistical summaries, etc. - produce a new kind of thing on which you will be performing new kinds of operations. In mathematics and statistics, you create a chain of operations and you need to be able to express the steps in that chain. It’s not a question of having enough buttons to list all the operations, you’ll need combinations of operations - more than could possibly be listed in a menu system.1
For those of you coming from a computer science background, John D. Cook provides some useful insight2
I have written software professionally in perhaps a dozen programming languages, and the hardest language for me to learn has been R. The language is actually fairly simple, but it is unconventional.
We can also track the competition between R
and Python
here.
For those of you coming from a business or engineering background, you should know that the difficulty of coding in Excel has contributed to some serious blunders in globally impactful discussions [1,2,3] and has serious flaws in how data are even handled.
Below are some references from reputable online sources about the impact and usefulness of R.
- Fast Company Article on R
- “R can do literally everything, and all new research is done in R. So especially for businesses that really want to out-compete their competitors on the basis of advanced analytics, they can get access to everything they need within R, things that might not come for five or 10 years through commercial software,” says Smith.
- New York Times on R
- “The great beauty of R is that you can modify it to do all sorts of things,” said Hal Varian, chief economist at Google. “And you have a lot of prepackaged stuff that’s already available, so you’re standing on the shoulders of giants.”
- InfoWorld
- Still, Adams and Peng both see R as an accessible language. “I don’t come from a computer science background and never had aspirations of becoming a programmer. Knowledge of programming fundamentals certainly helps when adding R to your toolbox, but I wouldn’t say it’s required to get started,” Adams says.
- “I wouldn’t even say R is for programmers. It’s best suited for people that have data-oriented problems they’re trying to solve, regardless of their programming aptitude.”
Learning R
We will spend time during class all semester learning new syntax. During the first two weeks we will spend a little more time learning the basics of R and how to use it (often the most challenging). Daniel Kaplan’s book also has a nice introduction.
R, R-Studio, .Rmd files, and .R scripts
We will need to install R, R-Studio, and get comfortable with .qmd
files and .R
files.
R
- The R website is where we can go to find the latest version to download for your particular operation system.
- Here is a video for Mac users and Windows users. There is also a Linux version of R as well.
.R Scripts
.R
script files are the typical file type for saving yourR
scripts. This is often the file I start with for any analysis. In fact, if you have experience with.Rmd
files, you can write an.R
script in such a way that it can be built into an html page like an.Rmd
file. See here.
R-Studio & qmd Files
- R-Studio is a great software to facilitate the use of R (and many other things). We will use R-Studio heavily in this course! Please download R-Studio.
.qmd
files are the passport to a full space of data presentation opportunities.
R Help Files
R packages
One of the greatest features of R is the opensource development of additional functions that can be easily shared through libraries. We will use a wide variety of packages in this class. This page has a list of the primary packages we will be using in the class. All packages first need to be installed into your local version of R using install.packages("PACKAGENAMEHERE")
.
tidyverse
Our book leverages this wrapper package heavily. See tidyverse.org for details. It really is simple way to load the following packages. See each link for a description of the respective package. Running install.packages("tidyverse")
installs many more packages (over 35).
readxl
This package is installed with `install.packages(“tidyverse”). The readxl package is the primary package we will use in class. It may be useful to know that tidyxl, or xlconnect provide much more comprehensive interaction with Excel workbooks. I have used the xlsx library as well. Note that the xlconnect and xlsx libraries require java be installed on your computers.