4 Visualizing Large Distributions (i.e. many rows) with a Small Number of Layers
Readings
You just started your internship at a big firm in New York, and your manager gave you an extensive file of flights that departed JFK, LGA, or EWR in 2013. From this data (which you can obtain in R) your manager wants you draw some insights.
Before we can start to answer business questions we need to become familiar with our data. If one is provided, you may want to start with the data dictionary. However, you can also just dive into the data and gain an understanding based on the variable names and types.
We will also want to know how the variables relate to each other. We can create tables or visualizations that summarize how different variables relate to each other. At this point, we are deepening our understanding as well as beginning our analysis.
- Create a new
.qmd
to do this task. Don’t forget to load (not install) the tidyverse package. - Use the
nycflights13::flights
dataset. (You may need to installnycflights13
package first).- Note the use of double colon allows you to access functions and datasets from a package without loading the entire package.
- You can run
?nycflights13::flights
in the console to learn more about the variables in the dataset. (Don’t include this in your Quarto file).
- Pick two variables (columns) whose relationship you would like to explore.
- Provide a visualization of the univariate distribution of each of the selected variables separately (i.e. 2 plots are needed here, 1 for each variable)
- Build bivariate summaries of the variables you have chosen to investigate (1 plot is needed here, the plot should contain both variables in it).
- Write one to two paragraphs in the
.qmd
summarizing insights from your graphics and your data presentation choices. - Render your
.qmd
file. Push all the files created in the rendering process into your GitHub repository.
Submit
In I-learn submit a link to the .md
file on GitHub.