pacman::p_load(
tidyverse, # ggplot, mutate(), cleaning...
tsibble, # as_tsibble()
fable, # model(...), forecast(), tidy(), glance()...
feasts, # ACF(), PACF()
ggtime, # autoplot() for tsibbles
patchwork, # + and / for ggplots
rio # import()
)White Noise and Random Walks - Part 1
Chapter 4: Lesson 1
Learning Outcomes
Characterize the properties of discrete white noise
- Define Residual error
- Define discrete white noise (DWN)
- Define Gaussian white noise
- Simulate Gaussian white noise with R
- Plot DWN simulation results
- State DWN second order properties
- Explain how to estimate (or fit) a DWN process
- State the assumptions needed to categorize residual error series as white noise
Characterize the properties of a random walk
- Define a random walk
Simulate realizations from basic time series models in R
- Simulate a random walk
- Plot a random walk
Preparation
- Read Sections 4.1-4.2, 4.3.1-4.3.5
Learning Journal Exchange (10 min)
Review another student’s journal
What would you add to your learning journal after reading another student’s?
What would you recommend the other student add to their learning journal?
Sign the Learning Journal review sheet for your peer
Packages
Class Activity: White Noise (15 min)
Definition
In this class, we are learning to investigate different types of time series. Up to this point, we have focused mostly on time series with distinct seasonal behavior. We will not focus on what are called stochastic processes or random processes, where there is not necessarily a seasonal component. We first focus on white noise.
Dice Activity
You will receive a die from the professor. Roll the die 30 times and draw your results on the board using a scatter plot. Your final result should look something like this:
results <- replicate(30, sum(sample(1:6, 1, replace = TRUE)))
df <- data.frame(Draw = 1:30, Value = results)
ggplot(df, aes(x = Draw, y = Value)) +
geom_point(color = "steelblue", size = 2) +
geom_line()+
scale_x_continuous(breaks = 1:30) +
scale_y_continuous(breaks = seq(1, 6 * 1, 1)) +
labs(
title = paste("Simulation of", 30, "Draws with", 1, "Dice"),
x = "Draw Number",
y = "Sum of Dice Roll(s)"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, size = 6))
Now repeat the process with two dice. Add a histogram plotting the number of occurrences of each value like:
results <- replicate(30, sum(sample(1:6, 2, replace = TRUE)))
df <- data.frame(Draw = 1:30, Value = results)
ggplot(df, aes(x = Value)) +
geom_histogram(binwidth = 1, fill = "steelblue", color = "black") +
scale_x_continuous(breaks = seq(2, 6 * 2, 1)) +
labs(
title = "Distribution of Dice Roll Sums",
x = "Sum of Dice Roll(s)",
y = "Frequency"
) +
theme_minimal()
Simulation
The following simulation illustrates a white noise time series.
Type I Errors
In your introductory statistics course, you probably learned about Type I error. Here is a quick refresher.
When we create a correlogram, we actually conduct one hypothesis test for each value of \(k\). With so many hypothesis tests, it is not surprising if some of them show a significant correlation due to chance alone. In this case, we tend to disregard correlations that are barely significant and inexplicable.
Visualizing White Noise
# Set random seed
set.seed(10)
# Specify means and standard deviation
n <- 2500 # number of points
white_noise_sigma <- rnorm(1, 5, 1) # choose a random standard deviation
# White noise data
white_noise_df <- data.frame(x = rnorm(n, 0, white_noise_sigma))The first 250 points in this time series are illustrated here:
Show the code
white_noise_df |>
mutate(t = 1:nrow(white_noise_df)) |>
head(250) |>
ggplot(aes(x = t, y = x)) +
geom_line() +
theme_bw() +
labs(
x = "Time",
y = "Values",
title = "First 250 Values of a Gaussian White Noise Time Series"
) +
theme(
plot.title = element_text(hjust = 0.5)
)
Here is a histogram of the 2500 values from this DWN distribution.
Show the code
white_noise_df |>
mutate(density = dnorm(x, mean(white_noise_df$x), sd(white_noise_df$x))) |>
ggplot(aes(x = x)) +
geom_histogram(aes(y = after_stat(density)),
color = "white", fill = "#56B4E9", binwidth = 1) +
geom_line(aes(x = x, y = density)) +
theme_bw() +
labs(
x = "Values",
y = "Frequency",
title = "Histogram of Values from a Gaussian White Noise Process"
) +
theme(
plot.title = element_text(hjust = 0.5)
)
Notice that the values follow a normal distribution. This suggests the data are from a Gaussian white noise distribution.
Second-Order Properties of Discrete White Noise
When we refer to the second-order properties of a time series, we are talking about its variance and covariance. The mean is a first-order property, the covariance is a second-order property.
Note that the properties given above are theoretical properties of the population, not estimates computed using a sample. The sample autocorrelations will not equal zero, due to randomness inherent in sampling.
Fitting the White Noise Model
Typically, a DWN series arises in the random component of another time series. If we have fully explained the level and seasonality in the time series, then the only component left is the random component, which would ideally follow a DWN process.
Since the mean of a DWN time series is zero, the only parameter we need to fit is the variance.
Class Activity: Random Walks (15 min)
Definitions
Consider moving on a number line, where your movements are determined by a discrete white noise (DWN) process. Each successive value indicates how far you will move along the number line from your current position. This is mathematically equivalent to allowing your position at time \(t\) to be the sum of all the observed DWN values up to time \(t\).
The value \(x_t\) can be considered as the cumulative summation of the first \(t\) values of the \(w_t\) series. In many cases, \(w_t\) is a discrete white noise series, and it is often modeled as a Gaussian white noise series. However, \(w_t\) could be as simple as a coin toss, as illustrated in the next activity.
Simulating a Random Walk
In this activity, we will simulate a discrete-time, discrete-space random walk.
Representations for a Random Walk
Recall the definition of a random walk:
\(\{x_t\}\) is a random walk if it can be expressed as \[ x_{t} = x_{t-1} + w_{t} \] where \(\{w_t\}\) is a white noise series.
Class Activity: Backward Shift Operator (10 min)
Definition of the Backward Shift Operator
This process of back substitution is so common, we define notation to handle it.
Properties of the Backshift Operator
The backwards shift operator is a linear operator. So, if \(a\), \(b\), \(c\), and \(d\) are constants, then \[ (a \mathbf{B} + b)x_t = a \mathbf{B} x_t + b x_t \] The distributive property also holds. \[\begin{align*} (a \mathbf{B} + b)(c \mathbf{B} + d) x_t &= c (a \mathbf{B} + b) \mathbf{B} x_t + d(a \mathbf{B} + b) x_t \\ &= a \mathbf{B} (c \mathbf{B} + d) x_t + b (c \mathbf{B} + d) x_t \\ &= \left( ac \mathbf{B}^2 + (ad+bc) \mathbf{B} + bd \right) x_t \\ &= ac \mathbf{B}^2 x_t + (ad+bc) \mathbf{B} x_t + (bd) x_t \end{align*}\]
We will practice applying this operator.
Class Activity: Properties of Random Walks (5 min)
Simulation
The following simulation illustrates a random walk.
Second-Order Properties of a Random Walk
The second-order properties of a random walk are summarized below.
Note that the covariance of a random walk process depends on \(t\). Hence, random walks are non-stationary. The variance is unbounded as \(t\) increases. That implies a random walk will not provide good predictions in the long term.
Note that if \(0 < k \ll t\), then \(\rho_k \approx 1\). Because of this, a correlogram for a random walk will typically demonstrate positive autocorrelations that start near 1 and slowly decrease as \(k\) increases. This is exactly what we observed in the simulation above.
Homework Preview (5 min)
- Review upcoming homework assignment
- Clarify questions
