White Noise and Random Walks - Part 2

Chapter 4: Lesson 2

Learning Outcomes

Characterize the properties of a random walk
  • Define the second order properties of a random walk
  • Define the backward shift operator
  • Use the backward shift operator to state a random walk as a sequence of white noise realizations
  • Define a random walk with drift
Simulate realizations from basic time series models in R
  • Simulate a random walk
  • Plot a random walk
Fit time series models to data and interpret fitted parameters
  • Motive the need for differencing in time series analysis
  • Define the difference operator
  • Explain the relationship between the difference operator and the backward shift operator
  • Test whether a series is a random walk using first differences
  • Explain how to estimate a random walk with increasing slope using Holt-Winters
  • Estimate the drift parameter of a random walk

Preparation

  • Read Sections 4.3.3-4.3.7 and 4.4

Learning Journal Exchange (10 min)

  • Review another student’s journal

  • What would you add to your learning journal after reading another student’s?

  • What would you recommend the other student add to their learning journal?

  • Sign the Learning Journal review sheet for your peer

Class Activity: Differencing a Time Series (15 min)

Example: McDonald’s Stock Prices

Computing the difference between successive terms of a random walk leads to a discrete white noise series.

\[\begin{align*} x_t &= x_{t-1} + w_t \\ x_t - x_{t-1} &= w_t \end{align*}\]

In many cases, differencing sequential terms of a non-stationary process can lead to a stationary process of differences. We can use the code below to obtain the daily closing stock prices for any publicly-traded company.

Show the code
# Set symbol and date range
symbol <- "MCD"
company <- "McDonald's"
date_start <- "2020-07-01"
date_end <- "2024-01-01"

# Fetch stock prices (can be used to get new data)
stock_df <- tq_get(symbol, from = date_start, to = date_end, get = "stock.prices")

# Transform data into tsibble
stock_ts <- stock_df %>%
  mutate(
    dates = date, 
    value = adjusted
  ) %>%
  dplyr::select(dates, value) %>%
  as_tibble() %>% 
  arrange(dates) |>
  mutate(diff = value - lag(value)) |>
  as_tsibble(index = dates, key = NULL)

The time plot in Figure 1 presents the closing price of McDonald’s stock from 01/07/2020 to 01/01/2024. Figure 2 gives the differences in the closing prices of the stock as a time series.

Show the code
plot_ly(stock_ts, x = ~dates, y = ~value, type = 'scatter', mode = 'lines') %>%
  layout(
    xaxis = list(title = paste0("Dates (", format(ymd(date_start), "%d/%m/%Y"), " to ", format(ymd(date_end), "%d/%m/%Y"), ")" ) ),
    yaxis = list(title = "Closing Price (US$)"),
    title = paste0("Time Plot of ", symbol, " Daily Closing Price")
  )
Figure 1: Plot of the daily closing price of the stock
Show the code
# Generate time series plot using plot_ly
plot_ly(stock_ts, x = ~dates, y = ~diff, type = 'scatter', mode = 'lines') %>%
  layout(
    xaxis = list(title = paste0("Dates (", format(ymd(date_start), "%d/%m/%Y"), " to ", format(ymd(date_end), "%d/%m/%Y"), ")" ) ),
    yaxis = list(title = "Closing Price (US$)"),
    title = paste0("Difference of ", symbol, " Daily Closing Price")
  )
Figure 2: Plot of the stock price differences

Figure 3 is the correlogram for the original McDonald’s stock price time series. Figure 4 gives the correlogram for the differences in successive closing stock prices.

Show the code
acf(stock_ts$value, plot=TRUE, type = "correlation", lag.max = 25)
Figure 3: Correlogram of the stock prices
Show the code
acf(stock_ts$diff |> na.omit(), plot=TRUE, type = "correlation", lag.max = 25)
Figure 4: Correlogram of the stock prices

Figure 5 is a histogram of the differences. On the right, we give the variance of the differences in the stock prices. This is a simple measure of the volatility of the stock, or in other words, how much the price changes in a day.

Show the code
# Histogram of differences in stock prices
stock_ts |>
  mutate(
    density = dnorm(diff, mean(stock_ts$diff, na.rm = TRUE), sd(stock_ts$diff, na.rm = TRUE))
  ) |>
  ggplot(aes(x = diff)) +
    geom_histogram(aes(y = after_stat(density)),
        color = "white", fill = "#56B4E9", binwidth = 1) +
    geom_line(aes(x = diff, y = density)) +
    theme_bw() +
    labs(
      x = "Difference",
      y = "Frequency",
      title = "Histogram of Difference in the Closing Stock Prices"
    ) +
    theme(
      plot.title = element_text(hjust = 0.5)
    )
Figure 5: Histogram of the stock price differences

The variance of the differences is 5.853.

Notice that the values in the correlogram of the stock prices start at 1 and slowly decay as \(k\) increases. There are no significant autocorrelations in the differenced values. This is exactly what we would expect from a random walk. It is also interesting that the differences are nearly normally distributed and uncorrelated.

Defintion of the Difference Operator

Differencing nonstationary time series often leads to a stationary series, so we will define a formal operator to express this process.

Definition of the Difference Operator

The difference operator, \(\nabla\), is defined as:

\[\nabla x_t = x_t - x_{t-1} = (1-\mathbf{B}) x_t\]

Higher-order differencing can be denoted

\[\nabla^n x_t = (1-\mathbf{B})^n x_t\]

To see what this expression gives us, note that \(\nabla\) gives a new time series that is comprised of the differences between successive terms of the original time series. The operator \(\nabla^2\) generates a time series that is comprised of the differences between successive terms of the differenced time series. It is the difference of the differences, or the second difference.

Check Your Understanding

Consider the following time series, where \(n=8\):

\(x_{1} = 5\),\(~\) \(x_{2} = 10\),\(~\) \(x_{3} = 13\),\(~\) \(x_{4} = 8\),\(~\) \(x_{5} = 4\),\(~\) \(x_{6} = 3\),\(~\) \(x_{7} = 9\),\(~\) \(x_{8} = 2\)

  • Find the first differences, \(\nabla x_t\)
  • Find the second differences, \(\nabla^2 x_t\).
  • Fill in the missing steps: \[\begin{align*} \nabla^2 x_8 &= (1-\mathbf{B} )^2 x_8 \\ &= (1-\mathbf{B} ) \left[ (1-\mathbf{B} ) x_8 \right] \\ & ~~~~~~~~~~~~~~~~~~~~~~ ⋮ \\ &= (x_8-x_7)-(x_7-x_6) \end{align*}\] and check that this is equal to the last term in the sequence of second differences.
$$t$$ $$x_t$$ $$\nabla x_t$$ $$\nabla^2 x_t$$
1 5
2 10
3 13
4 8
5 4
6 3
7 9
8 2

Small-Group Activity: Computing Differences

The difference operator can be helpful in identifying the functional underpinnings of a trend. If a function is linear, then the first differences of equally-spaced values will be constant. If a function is quadratic, then the second differences of equally-spaced values will be constant. If a function is cubic, then the third differences of equally-spaced values will be constant, and so on.

Compute the differences specified below.

Linear

$$t$$ $$x_t$$ $$\nabla x_t$$
1 7.5
2 10
3 12.5
4 15
5 17.5
6 20
7 22.5
8 25
9 27.5

Quadratic

$$t$$ $$x_t$$ $$\nabla x_t$$ $$\nabla^2 x_t$$
1 15
2 2
3 -7
4 -12
5 -13
6 -10
7 -3
8 8
9 23

Cubic

$$t$$ $$x_t$$ $$\nabla x_t$$ $$\nabla^2 x_t$$ $$\nabla^3 x_t$$
1 -2.7
2 0
3 1
4 0.9
5 0.3
6 -0.2
7 0
8 1.5
9 4.9

Small-Group Activity: Differencing Stock Prices (15 min)

In this activity, you will apply what you have learned to a new stock.

Check Your Understanding

Modify the code used to get the prices of McDonald’s stock to download closing stock prices for a different publicly-traded company over a time period of your choice.

Show the code
# Set symbol and date range
symbol <- "MCD"               # Stock trading symbol for McDonald's
date_start <- "2020-07-01"
date_end <- "2024-01-01"

# Fetch stock prices (can be used to get new data)
stock_df <- tq_get(symbol, from = date_start, to = date_end, get = "stock.prices")

# Transform data into tsibble
stock_ts <- stock_df %>%
  mutate(
    dates = date, 
    value = adjusted
  ) %>%
  dplyr::select(dates, value) %>%
  as_tibble() %>% 
  arrange(dates) |>
  mutate(diff = value - lag(value)) |>
  as_tsibble(index = dates, key = NULL)


plot_ly(stock_ts, x = ~dates, y = ~value, type = 'scatter', mode = 'lines') %>%
  layout(
    xaxis = list(title = paste0("Dates (", format(ymd(date_start), "%d/%m/%Y"), " to ", format(ymd(date_end), "%d/%m/%Y"), ")" ) ),
    yaxis = list(title = "Closing Price (US$)"),
    title = paste0("Time Plot of ", symbol, " Daily Closing Price")
  )

# Generate time series plot using plot_ly
plot_ly(stock_ts, x = ~dates, y = ~diff, type = 'scatter', mode = 'lines') %>%
  layout(
    xaxis = list(title = paste0("Dates (", format(ymd(date_start), "%d/%m/%Y"), " to ", format(ymd(date_end), "%d/%m/%Y"), ")" ) ),
    yaxis = list(title = "Closing Price (US$)"),
    title = paste0("Difference of ", symbol, " Daily Closing Price")
)

# Autocorrelation function for stock prices
acf(stock_ts$value, plot=TRUE, type = "correlation", lag.max = 25)

# Autocorrelation function for differences
acf(stock_ts$diff |> na.omit(), plot=TRUE, type = "correlation", lag.max = 25)

# Histogram of differences in stock prices
stock_ts |>
  mutate(
    density = dnorm(diff, mean(stock_ts$diff, na.rm = TRUE), sd(stock_ts$diff, na.rm = TRUE))
  ) |>
  ggplot(aes(x = diff)) +
  geom_histogram(aes(y = after_stat(density)),
                 color = "white", fill = "#56B4E9", binwidth = 1) +
  geom_line(aes(x = diff, y = density)) +
  theme_bw() +
  labs(
    x = "Difference",
    y = "Frequency",
    title = "Histogram of Difference in the Closing Stock Prices"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5)
  )

# Variance of the differences
var(stock_ts$diff, na.rm = TRUE)|> round(3)

Do the following.

  • Indicate which company you have chosen, the stock symbol, and the time period.
  • Create a time plot of the daily closing stock prices.
  • Produce a time plot of the differences in the daily closing stock prices.
  • Create a correlogram of the stock prices
  • Create a correlogram of the differences
  • Generate a histogram of the difference in the stock prices and superimpose the corresponding normal density.
  • Compute the variance of the differences
  • Compare your results with those from the other teams of students.

Optional Activity: Integrated Autoregressive Model (10 min)

Click here for further information about Section 4.4.2 in the book

Class Activity: Random Walk with Drift (15 min)

We will now consider the daily closing price of Abercrombie & Fitch stock (Symbol = ANF). Here is a time series plot of the closing stock prices.

Show the code
# Set symbol and date range
symbol <- "ANF"                # Abercrombie & Fitch stock trading symbol
date_start <- "2023-05-01"
date_end <- "2024-02-20"

# Fetch stock prices
df_stock <- tq_get(symbol, from = date_start, to = date_end, get = "stock.prices")

# Transform data into tsibble
df_tsibble <- df_stock |>
  mutate(
    dates = date, 
    value = close
  ) |>
  dplyr::select(dates, value) |>
  as_tibble() |> 
  arrange(dates) |>
  as_tsibble(index = dates, key = NULL)

# Generate time series plot using plot_ly
plot_ly(df_tsibble, x = ~dates, y = ~value, type = 'scatter', mode = 'lines') |>
  layout(
    xaxis = list(title = "Date"),
    yaxis = list(title = "Value"),
    title = paste0("Time Plot of ", symbol, " Daily Closing Price (", format(ymd(date_start), "%d %b %Y"), " - ", format(ymd(date_end), "%d %b %Y"),")")
  )
Figure 6: Time plot of the daily prices of ANF stock

We now generate a time plot and a correlogram of the differences. (No stock prices are recorded on weekends or holidays. Due to the gaps in the data, we will use the base R acf command, rather than the feasts ACF command.)

Show the code
df_tsibble$diff = df_tsibble$value - lag(df_tsibble$value) 
df_tsibble |>
  na.omit() |>
  autoplot(.vars = diff) + 
    labs(
      title = paste("Time Plot of Differences in Daily", symbol, "Stock Prices"),
      subtitle = 
        paste0(
          format(ymd(date_start), "%d %b %Y"),
            " - ",
          format(ymd(date_end), "%d %b %Y")
          ),
      x = "Date",
      y = "Difference",
    ) +
    theme_minimal() +
    theme(
      plot.title = element_text(hjust = 0.5),
      plot.subtitle = element_text(hjust = 0.5)
    )
Figure 7: Time plot of differences in the daily prices of ANF stock
Show the code
acf(df_tsibble$diff |> na.omit(), main = paste("ACF of First Difference of", symbol, "Stock Prices"))
Figure 8: Correlogram of first differences for the daily prices of ANF stock

There is no significant autocorrelation in the differences. They appear to be modeled reasonably well by white noise.

We now compute the mean and standard deviation of the differences.

mean_diff <- df_tsibble$diff |> mean(na.rm = TRUE)
sd_diff <- df_tsibble$diff |> sd(na.rm = TRUE)
n_diff <- df_tsibble$diff |> na.omit() |> length()

The mean of the differences is 0.486. The standard deviation of the differences is 1.732. There are 201 differences.

We can use the t-distribution to create a 95% confidence interval for the drift parameter. The critical \(t\) value is given by qt(0.975, df = 201 - 1), yielding a value of \(t^*_{0.975} = 1.972\).

So, our 95% confidence interval is computed as: \[ \left( \bar x - t^*_{0.975} \cdot \frac{s}{\sqrt{n}} , ~ \bar x + t^*_{0.975} \cdot \frac{s}{\sqrt{n}} \right) \] \[ \left( 0.486 - 1.972 \cdot \frac{1.732}{\sqrt{201}} , ~ 0.486 + 1.972 \cdot \frac{1.732}{\sqrt{201}} \right) \] \[ (0.245 , ~ 0.726) \]

This confidence interval does not contain 0, so we conclude that there is evidence of a positive drift in the price of Abercrombie & Fitch stock over this period.

Check Your Understanding

Use the McDonald’s stock price data to do the following.

  • What is the estimate of the drift parameter?
  • What is the standard deviation of the differences?
  • What is the 95% confidence interval for the drift parameter?
  • Is there evidence to suggest that the time series can be modeled as a random walk with drift? Why or why not?

Homework Preview (5 min)

  • Review upcoming homework assignment
  • Clarify questions
Download Homework

Difference Operator

Computing Differences

Integrated Autoregressive Model

McDonald’s Drift Parameter