19 Strings and Regex
Readings
- Chapter 14 Strings: R for Data Science (2nd ed)
- Chapter 15 Regular Expression: R for Data Science (2nd ed)
- Note: the stringi R package offers additional, less commonly needed functions for working with strings.
- stringr cheatsheet
- RVerbalExpressions package (optional)
- regexr.com (optional)
- Regular Expression examples (optional)
- Regular Expression support applet (optional)
Guided Instruction
Using g.r.e.p. (global regular expression print) and regular expressions (regex) to find character string patterns is a valuable tool in data analysis and is available with all operating systems and many different programming languages. It is a powerful tool once it is understood. The library(stringr)
package makes these tools much easier to use.
The three tasks below can be completed in many different ways, but generally should not require many lines of code.
- Use the
readr::read_lines()
function to read in each: string—randomletters.txt and randomletters_wnumbers.txt. - With the
randomletters.txt
file, pull out every 1700 letter (for example, 1, 1700, 3400, 5100, …) and find the quote that is hidden—the quote ends with a period. - With the
randomletters_wnumbers.txt
file, find all the numbers hidden, and convert those numbers to letters using the letters order in the alphabet to decipher the message. For example, a 1=a, 2=b,…, 26=z (Hint: the message starts with “experts”). - With the
randomletters.txt
file, remove all the spaces and periods from the string then find the longest sequence of vowels. - Save your
.R
script (not.qmd
) to your repository.
Submit
In I-learn submit a link to the script file on GitHub.