9 Clean and Reformat (aka Tidy) Stock Data
Readings
- Chapter 11.5 Writing Data: R for Data Science (2nd ed)
- Chapter 12 Tidy Data: R for Data Science (1st ed)
- (optional) For more complex pivoting examples, read the 2nd edition chapter: Chapter 5 Data Tidying: R for Data Science (2nd ed)
- (optional) A more robust function for extracting strings can be found in Chapter 14 Strings: R for Data Science (2nd ed). Section 14.4 is particularly relevant to this task.
- RDS vs. RData
- By the way, contrary to what the article suggests, I don’t believe saving your workspace is a good idea unless you think you’ll need it. It just slows down your computer. That’s the point of a script or markdown file, all the code is already saved so you can reproduce the workspace if necessary.
- tidyr cheatsheet
Guided Instruction
In 1973 Princeton University professor Burton Malkiel said
In 1990 the Wall Street Journal took Burton up on his challenge. They ran this challenge until 2002 and pitted random dart throwing selections to expert picks.
We have access to stock return data through 1998. Open the dataset and get familiar with it. The “PROS” rows contain returns that professional investors were able to achieve at that date. The “DARTS” rows contain returns of stocks that were selected by monkeys randomly throwing darts. The “DJIA” rows contain returns for stocks in the Dow Jones Industrial Average. The Dow Jones Industrial Average is a group of 30 stocks for large companies with stable earnings. This “index” of stocks is one of the oldest, most closely watched indices in the world and is designed to serve as a proxy, or indicator, of the United States economy in general. (learn about about “the Dow”.)
We want to look at the returns for each six-month period of the year in which the returns were reported.
- Use the appropriate function in
library(readr)
to read in the.RDS
file found on GitHub- Depending on your computer, you should use
read_rds(url("WEBLOCATION.RDS"))
orread_rds("WEBLOCATION.RDS")
to download and read the .RDS file type. Remember, R is case sensitive. ReplaceWEBLOCATION.RDS
with the correct url address. - When using file paths to read files in from Github (and other locations), you must pay attention to ‘raw’ vs. ‘blob’ (seeing this reading).
- Depending on your computer, you should use
- The
contestant_period
column is not “tidy” we want to create amonth_end
and ayear_end
column from the information it contains. - Save your “tidy” data as an
.rds
object. (as an optional challenge, see if you can read in the saved file!) - Use code to create a table of the DJIA returns that matches the table shown below (apply
pivot_wider()
to the data). Pay attention to detail.
- Render the
.qmd
file. Push all the files created in the rendering process into your GitHub repository.- Don’t be surprised if the table doesn’t render well in the
.md
file on Github. That is okay, as long as it looks correct in your.html
or.html
preview.
- Don’t be surprised if the table doesn’t render well in the
Submit
In I-learn submit a link to the .md
file on GitHub.