Autoregressive Moving Average (ARMA) Models

Chapter 6: Lesson 2

Learning Outcomes

Define autoregressive moving average (ARMA) models
  • Write the equation for an ARMA(p,q) model
  • Express an ARMA model in terms of the backward shift operators for the AR and MA components
  • State facts about ARMA processes related to stationarity, invertibility, special cases, parsimony, and parameter redundancy
  • Use ACF and PACF plots to determine if an AR, MA or ARMA model is appropriate for a time series
Apply an iterative time series modeling process
  • Fit a regression model to capture trend and seasonality
  • Examine residual diagnostic plots to assess autocorrelation
  • Fit an ARMA model to the residuals if needed
  • Check the residuals of the ARMA model for white noise
  • Forecast the original series by combining the regression and ARMA model forecasts

Preparation

  • Read Sections 6.5.1 and 6.6-6.7

Learning Journal Exchange (10 min)

  • Review another student’s journal

  • What would you add to your learning journal after reading another student’s?

  • What would you recommend the other student add to their learning journal?

  • Sign the Learning Journal review sheet for your peer

Class Activity: Introduction to Autoregressive Moving Average (ARMA) Models (10 min)

Autoregressive (AR) Models

In Chapter 4, Lesson 3, we learned the definition of an AR model:

Definition of an Autoregressive (AR) Model

The time series \(\{x_t\}\) is an autoregressive process of order \(p\), denoted as \(AR(p)\), if \[ x_t = \alpha_1 x_{t-1} + \alpha_2 x_{t-2} + \alpha_3 x_{t-3} + \cdots + \alpha_{p-1} x_{t-(p-1)} + \alpha_p x_{t-p} + w_t ~~~~~~~~~~~~~~~~~~~~~~~ (4.15) \]

where \(\{w_t\}\) is white noise and the \(\alpha_i\) are the model parameters with \(\alpha_p \ne 0\).

The \(AR(p)\) model can be expressed as: \[ \underbrace{\left( 1 - \alpha_1 \mathbf{B} - \alpha_2 \mathbf{B}^2 - \cdots - \alpha_p \mathbf{B}^p \right)}_{\theta_p \left( \mathbf{B} \right)} x_t = w_t \]

Moving Average (MA) Models

The definition of an \(MA(q)\) model is:

Definition of a Moving Average (MA) Model

We say that a time series \(\{x_t\}\) is a moving average process of order \(q\), denoted as \(MA(q)\), if each term in the time series is a linear combination of the current white noise term and the \(q\) most recent past white noise terms.

It is given as: \[ x_t = w_t + \beta_1 w_{t-1} + \beta_2 w_{t-2} + \beta_3 w_{t-3} + \cdots + \beta_{q-1} w_{t-(q-1)} + \beta_q w_{t-q} \]

where \(\{w_t\}\) is white noise with zero mean and variance \(\sigma_w^2\), and the \(\beta_i\) are the model parameters with \(\beta_q \ne 0\).

Written in terms of the backward shift operator, we have \[ x_t = \underbrace{\left( 1 + \beta_1 \mathbf{B} + \beta_2 \mathbf{B}^2 + \beta_3 \mathbf{B}^3 + \cdots + \beta_{q-1} \mathbf{B}^{q-1} + \beta_q \mathbf{B}^{q} \right)}_{\phi_q(\mathbf{B})} w_t \]

Putting the \(AR\) and \(MA\) models together, we get the \(ARMA\) model.

Definition of an Autogregressive Moving Average (ARMA) Model

A time series \(\{ x_t \}\) follows an autoregressive moving average (ARMA) model of order \((p, q)\), which we write as \(ARMA(p,q)\), if it can be written as:

\[ x_t = \underbrace{ \alpha_1 x_{t-1} + \alpha_2 x_{t-2} + \alpha_3 x_{t-3} + \cdots % + \alpha_{p-1} x_{t-(p-1)} + \alpha_p x_{t-p} }_{ AR(p) ~ \text{model} } + \underbrace{ w_t + \beta_1 w_{t-1} + \beta_2 w_{t-2} + \beta_3 w_{t-3} + \cdots % + \beta_{q-1} w_{t-(q-1)} + \beta_q w_{t-q} }_{ MA(q) ~ \text{model} } \] where \(\{w_t\}\) is a white noise process.

We can write this as: \[ \theta_p \left( \mathbf{B} \right) x_t = \phi_q \left( \mathbf{B} \right) w_t \]

Facts about ARMA Processes

The following facts are true for \(ARMA(p,q)\) processes:

  • The ARMA process is stationary if all the roots of \(\theta_p \left( \mathbf{B} \right)\) are greater than 1 in absolute value.
  • The ARMA process is invertible if all the roots of \(\phi_q \left( \mathbf{B} \right)\) are greater than 1 in absolute value.
  • The special case \(ARMA(p,0)\) is the \(AR(p)\) model.
  • The special case \(ARMA(0,q)\) is the \(MA(q)\) model.
  • An \(ARMA\) model will usually require fewer parameters than a single \(MA\) or \(AR\) model. This is called parameter parsimony.
  • If \(\theta\) and \(\phi\) have a common factor, a stationary model can be simplified. This is called parameter redundancy. As an example, the model \[ \left( 1 - \frac{1}{2} \mathbf{B} \right)\left( 1 - \frac{1}{3} \mathbf{B} \right) x_t = \left( 1-\frac{1}{2} \mathbf{B} \right)\left( 1 - \frac{1}{4} \mathbf{B} \right) w_t \] is the same as the model \[ \left( 1 - \frac{1}{3} \mathbf{B} \right) x_t = \left( 1 - \frac{1}{4} \mathbf{B} \right) w_t \]

Comparison of AR and MA Models

ACF and PACF of an \(AR(p)\) Process

We can use the pacf and acf plots to assess if an \(AR(p)\) or \(MA(q)\) model is appropriate. For an \(AR(p)\) or \(MA(q)\) process, we observe the following:

AR(p) MA(q) ARMA(p,q)
ACF Tails off Cuts off after lag \(q\) Tails off
PACF Cuts off after lag \(p\) Tails off Tails off

Analyzing Time Series with a Regular Seasonal Pattern

Here are some steps you can use to guide your work as you analyze time series with a regular seasonal pattern. Even though these are presented linearly, you may find it helpful to iterate between some of the steps as needed.

Analyzing a Time Series with a Regular Seasonal Pattern
  • Create a time plot of the series
  • Determine an appropriate model for the trend and seasonality; call this Model 1
    • Regression
    • Holt-Winters
    • Additive / Multiplicative Decomposition
    • Other Techniques
  • Use AIC/AICc/BIC (or other metrics) and your reasoning skills to choose the best model
  • Obtain the residuals
  • Check the residuals of Model 1 for evidence of autocorrelation and non-stationarity
    • Time plot of the residuals
    • ACF plot
    • PACF plot
  • Fit an AR/MA/ARMA model to the (stationary) residuals; call this Model 2
  • Check the residuals of Model 2 for evidence of autocorrelation and normality
    • ACF plot
    • PACF plot
    • Histogram
  • Forecast the trend and seasonality of the time series using Model 1
  • Forecast the residuals of Model 1 using Model 2
  • Add the two forecasts together
  • Plot and summarize the resulting forecast
  • Refine the models above as needed

Class Activity: Model for the Residuals from the Rexburg Weather Model (15 min)

Review

We now review the model we built in Chapter 5 Lesson 3 for the monthly average of the daily high temperature in Rexburg, Idaho.

Show the code
weather_df <- rio::import("https://byuistats.github.io/timeseries/data/rexburg_weather_monthly.csv") |>
  mutate(dates = my(date_text)) |>
  filter(dates >= my("1/2008") & dates <= my("12/2023")) |>
  rename(x = avg_daily_high_temp) |>
  mutate(TIME = 1:n()) |>
  mutate(
    cos1 = cos(2 * pi * 1 * TIME/12),
    cos2 = cos(2 * pi * 2 * TIME/12),
    cos3 = cos(2 * pi * 3 * TIME/12),
    cos4 = cos(2 * pi * 4 * TIME/12),
    cos5 = cos(2 * pi * 5 * TIME/12),
    cos6 = cos(2 * pi * 6 * TIME/12),
    sin1 = sin(2 * pi * 1 * TIME/12),
    sin2 = sin(2 * pi * 2 * TIME/12),
    sin3 = sin(2 * pi * 3 * TIME/12),
    sin4 = sin(2 * pi * 4 * TIME/12),
    sin5 = sin(2 * pi * 5 * TIME/12),
    sin6 = sin(2 * pi * 6 * TIME/12)) |>
  mutate(zTIME = (TIME - mean(TIME)) / sd(TIME)) |>
  as_tsibble(index = TIME)

weather_df |>
  as_tsibble(index = dates) |>
  autoplot(.vars = x) +
  geom_smooth(method = "lm", se = FALSE, color = "#F0E442") +
    labs(
      x = "Month",
      y = "Mean Daily High Temperature (Fahrenheit)",
      title = "Time Plot of Mean Daily Rexburg High Temperature by Month",
      subtitle = paste0("(", format(weather_df$dates %>% head(1), "%b %Y"), endash, format(weather_df$dates %>% tail(1), "%b %Y"), ")")
    ) +
    theme_minimal() +
    theme(
      plot.title = element_text(hjust = 0.5),
      plot.subtitle = element_text(hjust = 0.5)
    )
Figure 1: Time plot of the monthly mean daily high temperatures in Rexburg Idaho, in degrees Fahrenheit

We chose the “Reduced Linear 5” model. For convenience, we reprint the coefficients here.

Show the code
reduced5_linear_lm <- weather_df |>
  model(reduced_linear_5  = TSLM(x ~ zTIME + sin1 + cos1 + sin2 + cos2))

reduced5_linear_lm |>
  tidy() |>
  mutate(sig = p.value < 0.05)
# A tibble: 6 × 7
  .model           term        estimate std.error statistic   p.value sig  
  <chr>            <chr>          <dbl>     <dbl>     <dbl>     <dbl> <lgl>
1 reduced_linear_5 (Intercept)   56.5       0.279    202.   5.06e-220 TRUE 
2 reduced_linear_5 zTIME          0.692     0.280      2.47 1.44e-  2 TRUE 
3 reduced_linear_5 sin1         -16.1       0.395    -40.7  1.66e- 94 TRUE 
4 reduced_linear_5 cos1         -23.7       0.395    -60.1  9.40e-124 TRUE 
5 reduced_linear_5 sin2           1.26      0.395      3.20 1.60e-  3 TRUE 
6 reduced_linear_5 cos2          -2.86      0.395     -7.24 1.13e- 11 TRUE 
Show the code
r5lin_coef_unrounded <- reduced5_linear_lm |>
  tidy() |>
  select(term, estimate, std.error)

r5lin_coef <- r5lin_coef_unrounded |>
  round_df(3)

stats_unrounded <- weather_df |>
  as_tibble() |>
  dplyr::select(TIME) |>
  summarize(mean = mean(TIME), sd = sd(TIME))

stats <- stats_unrounded |>
  round_df(3)

The fitted model is:

\[\begin{align*} x_t &= \hat \beta_0 + \hat \beta_1 \left( \frac{t - \bar t}{s_t} \right) \\ & ~~~~~~~~~~ + \hat \beta_2 \sin \left( \frac{2\pi \cdot 1 t}{12} \right) + \hat \beta_3 \cos \left( \frac{2\pi \cdot 1 t}{12} \right) \\ & ~~~~~~~~~~ + \hat \beta_4 \sin \left( \frac{2\pi \cdot 2 t}{12} \right) + \hat \beta_5 \cos \left( \frac{2\pi \cdot 2 t}{12} \right) \\ &= 56.467 + 0.692 \left( \frac{t - 96.5}{55.57} \right) \\ & ~~~~~~~~~~~~~~~~~ + (-16.068) \sin \left( \frac{2\pi \cdot 1 t}{12} \right) + (-23.708) \cos \left( \frac{2\pi \cdot 1 t}{12} \right) \\ & ~~~~~~~~~~~~~~~~~ + 1.264 \sin \left( \frac{2\pi \cdot 2 t}{12} \right) + (-2.858) \cos \left( \frac{2\pi \cdot 2 t}{12} \right) \\ \end{align*}\]

Show the code
num_months <- weather_df |> 
  as_tibble() |> 
  dplyr::select(TIME) |> 
  tail(1) |> 
  pull()

df <- tibble( TIME = seq(1, num_months, 0.01) ) |>
  mutate(
    cos1 = cos(2 * pi * 1 * TIME/12),
    cos2 = cos(2 * pi * 2 * TIME/12),
    cos3 = cos(2 * pi * 3 * TIME/12),
    cos4 = cos(2 * pi * 4 * TIME/12),
    cos5 = cos(2 * pi * 5 * TIME/12),
    cos6 = cos(2 * pi * 6 * TIME/12),
    sin1 = sin(2 * pi * 1 * TIME/12),
    sin2 = sin(2 * pi * 2 * TIME/12),
    sin3 = sin(2 * pi * 3 * TIME/12),
    sin4 = sin(2 * pi * 4 * TIME/12),
    sin5 = sin(2 * pi * 5 * TIME/12),
    sin6 = sin(2 * pi * 6 * TIME/12)) |>
  mutate(zTIME = (TIME - mean(TIME)) / sd(TIME)) |>
  as_tsibble(index = TIME)

linear5_ts <- reduced5_linear_lm |>
  forecast(df) |>
  as_tibble() |>
  dplyr::select(TIME, .mean) |>
  rename(value = .mean) |>
  mutate(Model = "Linear 5")

data_ts <- weather_df |> 
  as_tibble() |>
  rename(value = x) |>
  mutate(Model = "Data") |>
  dplyr::select(TIME, value, Model)

combined_ts <- bind_rows(data_ts, linear5_ts) 
point_ts <- combined_ts |> filter(TIME == floor(TIME))

okabe_ito_colors <- c("#000000", "#E69F00")

combined_ts |>
  ggplot(aes(x = TIME, y = value, color = Model)) +
  geom_line() +
  geom_point(data = point_ts, alpha = 0.5) +
  labs(
      x = "Month Number",
      y = "Temperature (Fahrenheit)",
      title = "Monthly Average of Daily High Temperatures in Rexburg",
      subtitle = paste0("(", format(weather_df$dates %>% head(1), "%b %Y"), endash, format(weather_df$dates %>% tail(1), "%b %Y"), ")")
  ) +    
  scale_color_manual(
    values = okabe_ito_colors[1:nrow(combined_ts |> as_tibble() |> select(Model) |> unique())], 
    name = ""
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5),
    legend.position = "top", # Position the legend at the top
    legend.direction = "horizontal" # Set the legend direction to horizontal
  )
Figure 2: Time plot of the monthly mean daily high temperatures in Rexburg Idaho, in degrees Fahrenheit; the fitted values from the regression model are given in orange

Fitting an ARMA(p,q) Model

First, we create an acf and pacf plot of the residuals from the model above.

Show the code
reduced5_linear_lm |>
  residuals() |>
  ACF() |> 
  autoplot(var = .resid)
Figure 3: ACF Plot of the Residuals from the Rexburg Weather Model
Show the code
reduced5_linear_lm |> 
  residuals() |>
  PACF() |>
  autoplot(var = .resid)
Figure 4: PACF Plot of the Residuals from the Rexburg Weather Model
Check Your Understanding
  • What ARMA model does the combined information in the acf and pacf plots suggest would be appropriate for the residuals of the Rexburg weather model?

Here are summaries of some ARMA models that could be constructed.

model_resid <- reduced5_linear_lm |>
  residuals() |>
  select(-.model) |>
  model(
    auto = ARIMA(.resid ~ 1 + pdq(0:2,0,0:2) + PDQ(0, 0, 0)),
    a000 = ARIMA(.resid ~ 1 + pdq(0,0,0) + PDQ(0, 0, 0)),
    a001 = ARIMA(.resid ~ 1 + pdq(0,0,1) + PDQ(0, 0, 0)),
    a002 = ARIMA(.resid ~ 1 + pdq(0,0,2) + PDQ(0, 0, 0)),
    a100 = ARIMA(.resid ~ 1 + pdq(1,0,0) + PDQ(0, 0, 0)),
    a101 = ARIMA(.resid ~ 1 + pdq(1,0,1) + PDQ(0, 0, 0)),
    a102 = ARIMA(.resid ~ 1 + pdq(1,0,2) + PDQ(0, 0, 0)),
    a200 = ARIMA(.resid ~ 1 + pdq(2,0,0) + PDQ(0, 0, 0)),
    a201 = ARIMA(.resid ~ 1 + pdq(2,0,1) + PDQ(0, 0, 0)),
    a202 = ARIMA(.resid ~ 1 + pdq(2,0,2) + PDQ(0, 0, 0))) 

model_resid |>
  glance() 
Table 1: Comparison of the AIC, AICc, and BIC values for the models fitted to the Rexburg weather data
Model AIC AICc BIC
auto **1056** **1056.1** **1065.7**
a000 1062.1 1062.1 1068.6
a001 1056.9 1057 1066.7
a002 1057.9 1058.1 1071
a100 **1056** **1056.1** **1065.7**
a101 1057.2 1057.4 1070.2
a102 1059.1 1059.4 1075.4
a200 1057.5 1057.7 1070.5
a201 1059.1 1059.5 1075.4
a202 1059.6 1060 1079.1
Show the code
model_resid |>
  tidy() |>
  filter(.model == "a101") 
# A tibble: 3 × 6
  .model term     estimate std.error statistic p.value
  <chr>  <chr>       <dbl>     <dbl>     <dbl>   <dbl>
1 a101   ar1       0.528       0.262    2.02    0.0451
2 a101   ma1      -0.341       0.288   -1.18    0.238 
3 a101   constant  0.00236     0.176    0.0134  0.989 
Show the code
model_resid |>
  select(a101) |>
  residuals() |>
  ACF() |>
  autoplot(var = .resid)
Figure 5: ACF plot of the residuals from the \(ARMA(1,1)\) model
Show the code
model_resid |>
  select(a101) |>
  residuals() |>
  PACF() |>
  autoplot(var = .resid)
Figure 6: PACF plot of the residuals from the \(ARMA(1,1)\) model

Small-Group Activity: Industrial Electricity Consumption in Texas (20 min)

These data represent the amount of electricity used each month for industrial applications in Texas.

Show the code
elec_ts <- rio::import("https://byuistats.github.io/timeseries/data/electricity_tx.csv") |>
  dplyr::select(-comments) |>
  mutate(month = my(month)) |>
  mutate(
    t = 1:n(),
    std_t = (t - mean(t)) / sd(t)
  ) |>
  mutate(
    cos1 = cos(2 * pi * 1 * t / 12),
    cos2 = cos(2 * pi * 2 * t / 12),
    cos3 = cos(2 * pi * 3 * t / 12),
    cos4 = cos(2 * pi * 4 * t / 12),
    cos5 = cos(2 * pi * 5 * t / 12),
    cos6 = cos(2 * pi * 6 * t / 12),
    sin1 = sin(2 * pi * 1 * t / 12),
    sin2 = sin(2 * pi * 2 * t / 12),
    sin3 = sin(2 * pi * 3 * t / 12),
    sin4 = sin(2 * pi * 4 * t / 12),
    sin5 = sin(2 * pi * 5 * t / 12)
  ) |>
  as_tsibble(index = month)

elec_plot_raw <- elec_ts |>
    autoplot(.vars = megawatthours) +
    labs(
      x = "Month",
      y = "Megawatt-hours",
      title = "Texas' Industrial Electricity Use"
    ) +
    theme_minimal() +
    theme(
      plot.title = element_text(hjust = 0.5)
    )

elec_plot_log <- elec_ts |>
    autoplot(.vars = log(megawatthours)) +
    labs(
      x = "Month",
      y = "log(Megwatt-hours)",
      title = "Log of Texas' Industrial Electricity Use"
    ) +
    theme_minimal() +
    theme(
      plot.title = element_text(hjust = 0.5)
    )

elec_plot_raw | elec_plot_log

Check Your Understanding

Use the Texas industrial electricity consumption data to do the following.

  • Select an appropriate model to fit the time series using the AIC, AICc, or BIC critera.
  • Determine the best \(ARMA(p,q)\) model for the residuals.

Small Group Activity: Fitting an ARMA(p,q) Model to the Retail Data (30 min)

Apply what you have learned in this lesson to the retail sales data. In particular, consider full-service restaurants (NAICS code 722511).

Check Your Understanding

Use the total retail sales for full-service restaurants to do the following.

  • Select an appropriate model to fit the time series using the AIC, AICc, or BIC critera.
  • Determine the best \(ARMA(p,q)\) model for the residuals.
  • Forecast the data for the next 5 years using the regression model.
  • Forecast the residuals for the next 5 years using ARMA model.
  • Sum the two forecasted time series and plot the combined series against the original retail sales data.

Here is some code you can use to help you get started.

Show the code
# Read in retail sales data for "Full-Service Restaurants"
retail_ts <- rio::import("https://byuistats.github.io/timeseries/data/retail_by_business_type.parquet") |>
  filter(naics == 722511) |>
  mutate(
    month = yearmonth(as_date(month)),
    month_number = month(month)
  ) |>
  mutate(t = 1:n()) |>
  mutate(std_t = (t - mean(t)) / sd(t)) |>
  mutate(
    cos1 = cos(2 * pi * 1 * t / 12),
    cos2 = cos(2 * pi * 2 * t / 12),
    cos3 = cos(2 * pi * 3 * t / 12),
    cos4 = cos(2 * pi * 4 * t / 12),
    cos5 = cos(2 * pi * 5 * t / 12),
    cos6 = cos(2 * pi * 6 * t / 12),
    sin1 = sin(2 * pi * 1 * t / 12),
    sin2 = sin(2 * pi * 2 * t / 12),
    sin3 = sin(2 * pi * 3 * t / 12),
    sin4 = sin(2 * pi * 4 * t / 12),
    sin5 = sin(2 * pi * 5 * t / 12)
  ) |>
  mutate(
    in_great_recession = ifelse(ym(month) >= my("December 2007") & month <= my("June 2009"), 1, 0),
    after_great_recession = ifelse(ym(month) > my("June 2009"), 1, 0)
  ) |>
  as_tsibble(index = month)

retail_plot_raw <- retail_ts |>
    autoplot(.vars = sales_millions) +
    labs(
      x = "Month",
      y = "sales_millions",
      title = "Other General Merchandise Sales"
    ) +
    theme_minimal() +
    theme(
      plot.title = element_text(hjust = 0.5)
    )

retail_plot_log <- retail_ts |>
    autoplot(.vars = log(sales_millions)) +
    labs(
      x = "Month",
      y = "log(sales_millions)",
      title = "Logarithm of Other Gen. Merch. Sales"
    ) +
    theme_minimal() +
    theme(
      plot.title = element_text(hjust = 0.5)
    )

retail_plot_raw | retail_plot_log
Figure 7: Time plot of the time series (left) and the natural logarithm of the time series (right)
Show the code
retail_final_lm <- retail_ts |>
  model(retail_full = TSLM(log(sales_millions) ~  
      (in_great_recession * std_t) + (after_great_recession * std_t) 
      + (in_great_recession * I(std_t^2) ) + (after_great_recession * I(std_t^2) )
      + (in_great_recession * I(std_t^3) ) + (after_great_recession * I(std_t^3) )
      + sin1 + cos1 + sin2 + cos2 + sin3 + cos3
      + sin4 + cos4 + sin5 + cos5 + cos6 # Note sin6 is omitted
    )
    ) 

retail_final_lm |>
  tidy() |>
  mutate(sig = p.value < 0.05) |>
  knitr::kable(format = "html", align='ccccccc', escape = FALSE, width = NA, row.names = FALSE) |>
  kable_styling(full_width = FALSE, "striped")
Table 2: Coefficients for the fitted model for the total sales in full-service restaurants in the United States
.model term estimate std.error statistic p.value sig
retail_full (Intercept) 9.6959936 0.0062100 1561.3586384 0.0000000 TRUE
retail_full in_great_recession 0.1057700 0.1187801 0.8904693 0.3738957 FALSE
retail_full std_t 0.4905760 0.0326870 15.0083105 0.0000000 TRUE
retail_full after_great_recession -0.1556298 0.0312565 -4.9791109 0.0000011 TRUE
retail_full I(std_t^2) -0.0163208 0.0458329 -0.3560935 0.7220097 FALSE
retail_full I(std_t^3) -0.0102056 0.0178717 -0.5710498 0.5683744 FALSE
retail_full sin1 -0.0000252 0.0020362 -0.0123921 0.9901207 FALSE
retail_full cos1 -0.0360436 0.0020595 -17.5008207 0.0000000 TRUE
retail_full sin2 -0.0037502 0.0020154 -1.8607818 0.0637096 FALSE
retail_full cos2 0.0078111 0.0020169 3.8728011 0.0001309 TRUE
retail_full sin3 -0.0116397 0.0020098 -5.7913822 0.0000000 TRUE
retail_full cos3 0.0213598 0.0020189 10.5801523 0.0000000 TRUE
retail_full sin4 0.0001229 0.0020100 0.0611468 0.9512812 FALSE
retail_full cos4 0.0101346 0.0020145 5.0308564 0.0000008 TRUE
retail_full sin5 0.0277666 0.0020122 13.7990897 0.0000000 TRUE
retail_full cos5 0.0203802 0.0020132 10.1232890 0.0000000 TRUE
retail_full cos6 0.0038084 0.0014225 2.6772474 0.0078131 TRUE
retail_full in_great_recession:std_t -2.5718611 3.0873172 -0.8330408 0.4054549 FALSE
retail_full std_t:after_great_recession 0.0198043 0.1413695 0.1400891 0.8886794 FALSE
retail_full in_great_recession:I(std_t^2) 14.8230840 24.4731229 0.6056883 0.5451593 FALSE
retail_full after_great_recession:I(std_t^2) 0.1215930 0.1888445 0.6438792 0.5201238 FALSE
retail_full in_great_recession:I(std_t^3) -35.1756536 60.1509944 -0.5847892 0.5591094 FALSE
retail_full after_great_recession:I(std_t^3) -0.0653848 0.0766366 -0.8531788 0.3942105 FALSE
Show the code
retail_resid_df <- retail_final_lm |> 
  residuals() |> 
  as_tibble() |> 
  dplyr::select(.resid) |>
  rename(x = .resid) 
  
retail_resid_df |>
  mutate(density = dnorm(x, mean(retail_resid_df$x), sd(retail_resid_df$x))) |>
  ggplot(aes(x = x)) +
    geom_histogram(aes(y = after_stat(density)),
        color = "white", fill = "#56B4E9", binwidth = 0.02) +
    geom_line(aes(x = x, y = density)) +
    theme_bw() +
    labs(
      x = "Values",
      y = "Frequency",
      title = "Histogram of Residuals"
    ) +
    theme(
      plot.title = element_text(hjust = 0.5)
    )
Figure 8: Histogram of the residuals from the fitted regression model

The skewness is given by the command skewness:

Show the code
skewness(retail_resid_df$x)
[1] 0.01014588
Show the code
retail_final_lm |>
  residuals() |>
  ACF() |> 
  autoplot(var = .resid)
Figure 9: Correlogram of the residuals from the fitted regression model
Show the code
retail_final_lm |>
  residuals() |>
  PACF() |> 
  autoplot(var = .resid)
Figure 10: Partial correlogram of the residuals from the fitted regression model
Show the code
################################################################
#   Be sure to adjust this model to match what you did above   #
################################################################
#
# Model without any seasonal trends
retail_no_seasonal_lm <- retail_ts |>
  model(retail_full_quad = TSLM(log(sales_millions) ~  
      (in_great_recession * std_t) + (after_great_recession * std_t) 
      + (in_great_recession * I(std_t^2) ) + (after_great_recession * I(std_t^2) )
      + (in_great_recession * I(std_t^3) ) + (after_great_recession * I(std_t^3) )
    )
    )

retail_ts |> 
  autoplot(.vars = sales_millions) +
  # ggplot(mapping = aes(x = month, y = sales_millions)) +
  # geom_line() +
  geom_line(data = augment(retail_no_seasonal_lm),
            aes(x = month, y = .fitted),
            color = "#E69F00",
            linewidth = 1
            ) +
  geom_line(data = augment(retail_final_lm),
            aes(x = month, y = .fitted),
            color = "blue",
            alpha = 0.7,
            # linewidth = 1
            ) +
  labs(
    x = "Month",
    y = "Sales (Millions of U.S. Dollars)",
    title = paste0(retail_ts$business[1], " (", retail_ts$naics[1], ")")
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))
Figure 11: Time plot of the retail sales data for full-service restaurants; the regression curve without the seasonal terms is given in orange; the predicted values from the model are given in blue
Show the code
model_resid <- retail_final_lm |>
  residuals() |>
  select(-.model) |>
  model(
    auto = ARIMA(.resid ~ 1 + pdq(0:5,0,0:5) + PDQ(0, 0, 0)),
    a000 = ARIMA(.resid ~ 1 + pdq(0,0,0) + PDQ(0, 0, 0)),
    a001 = ARIMA(.resid ~ 1 + pdq(0,0,1) + PDQ(0, 0, 0)),
    a002 = ARIMA(.resid ~ 1 + pdq(0,0,2) + PDQ(0, 0, 0)),
    a003 = ARIMA(.resid ~ 1 + pdq(0,0,3) + PDQ(0, 0, 0)),
    a004 = ARIMA(.resid ~ 1 + pdq(0,0,4) + PDQ(0, 0, 0)),
    a005 = ARIMA(.resid ~ 1 + pdq(0,0,5) + PDQ(0, 0, 0)),
    a100 = ARIMA(.resid ~ 1 + pdq(1,0,0) + PDQ(0, 0, 0)),
    a101 = ARIMA(.resid ~ 1 + pdq(1,0,1) + PDQ(0, 0, 0)),
    a102 = ARIMA(.resid ~ 1 + pdq(1,0,2) + PDQ(0, 0, 0)),
    a103 = ARIMA(.resid ~ 1 + pdq(1,0,3) + PDQ(0, 0, 0)),
    a104 = ARIMA(.resid ~ 1 + pdq(1,0,4) + PDQ(0, 0, 0)),
    a105 = ARIMA(.resid ~ 1 + pdq(1,0,5) + PDQ(0, 0, 0)), 
    a200 = ARIMA(.resid ~ 1 + pdq(2,0,0) + PDQ(0, 0, 0)),
    a201 = ARIMA(.resid ~ 1 + pdq(2,0,1) + PDQ(0, 0, 0)),
    a202 = ARIMA(.resid ~ 1 + pdq(2,0,2) + PDQ(0, 0, 0)),
    a203 = ARIMA(.resid ~ 1 + pdq(2,0,3) + PDQ(0, 0, 0)),
    a204 = ARIMA(.resid ~ 1 + pdq(2,0,4) + PDQ(0, 0, 0)),
    a205 = ARIMA(.resid ~ 1 + pdq(2,0,5) + PDQ(0, 0, 0)),
    a300 = ARIMA(.resid ~ 1 + pdq(3,0,0) + PDQ(0, 0, 0)),
    a301 = ARIMA(.resid ~ 1 + pdq(3,0,1) + PDQ(0, 0, 0)),
    a302 = ARIMA(.resid ~ 1 + pdq(3,0,2) + PDQ(0, 0, 0)),
    a303 = ARIMA(.resid ~ 1 + pdq(3,0,3) + PDQ(0, 0, 0)),
    a304 = ARIMA(.resid ~ 1 + pdq(3,0,4) + PDQ(0, 0, 0)),
    a305 = ARIMA(.resid ~ 1 + pdq(3,0,5) + PDQ(0, 0, 0)), 
    a400 = ARIMA(.resid ~ 1 + pdq(4,0,0) + PDQ(0, 0, 0)),
    a401 = ARIMA(.resid ~ 1 + pdq(4,0,1) + PDQ(0, 0, 0)),
    a402 = ARIMA(.resid ~ 1 + pdq(4,0,2) + PDQ(0, 0, 0)),
    a403 = ARIMA(.resid ~ 1 + pdq(4,0,3) + PDQ(0, 0, 0)),
    a404 = ARIMA(.resid ~ 1 + pdq(4,0,4) + PDQ(0, 0, 0)),
    a405 = ARIMA(.resid ~ 1 + pdq(4,0,5) + PDQ(0, 0, 0)),
    a500 = ARIMA(.resid ~ 1 + pdq(5,0,0) + PDQ(0, 0, 0)),
    a501 = ARIMA(.resid ~ 1 + pdq(5,0,1) + PDQ(0, 0, 0)),
    a502 = ARIMA(.resid ~ 1 + pdq(5,0,2) + PDQ(0, 0, 0)),
    a503 = ARIMA(.resid ~ 1 + pdq(5,0,3) + PDQ(0, 0, 0)),
    a504 = ARIMA(.resid ~ 1 + pdq(5,0,4) + PDQ(0, 0, 0)),
    a505 = ARIMA(.resid ~ 1 + pdq(5,0,5) + PDQ(0, 0, 0))
  )


# This command indicates which model was selected by "auto"
# model_resid |> select(auto)

model_resid |> 
  glance() |>
  # uncomment the next line to only show the "best" models according to these criteria
  # filter(AIC == min(AIC) | AICc == min(AICc) | BIC == min(BIC) | .model == "auto") |>
  knitr::kable(format = "html", align='ccccccc', escape = FALSE, width = NA, row.names = FALSE) |>
  kable_styling(full_width = FALSE, "striped")
Table 3: Results from the process of fitting an ARMA model to the residuals of the model for the total sales in full-service restaurants in the United States
.model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
auto 0.0003644 841.5138 -1667.028 -1666.631 -1635.677 -0.5762646+0.8613026i, -0.5762646-0.8613026i, 2.2627735-0.8408307i, 2.2627735+0.8408307i -0.5154592+0.9948076i, -0.5154592-0.9948076i
a000 0.0005764 762.4127 -1520.825 -1520.793 -1512.988
a001 0.0004803 793.5159 -1581.032 -1580.966 -1569.275 -2.382155+0i
a002 0.0004782 794.7146 -1581.429 -1581.320 -1565.754 -1.822802+2.494324i, -1.822802-2.494324i
a003 0.0004109 820.4694 -1630.939 -1630.775 -1611.344 0.232624+1.346072i, -1.379822+0.000000i, 0.232624-1.346072i
a004 0.0004086 821.8842 -1631.768 -1631.538 -1608.255 0.090446+1.334121i, -1.288012-0.000000i, 0.090446-1.334121i, 4.263380+0.000000i
a005 0.0004097 821.8871 -1629.774 -1629.466 -1602.342 0.096742+1.327683i, -1.295555-0.000000i, 0.096742-1.327683i, 3.658704-0.000000i, -19.273408-0.000000i
a100 0.0004615 800.2336 -1594.467 -1594.402 -1582.710 2.220227+0i
a101 0.0004590 801.5727 -1595.145 -1595.036 -1579.470 1.709493+0i 6.008737+0i
a102 0.0004537 803.9495 -1597.899 -1597.735 -1578.305 2.092518+0i 0.291385+2.033057i, 0.291385-2.033057i
a103 0.0004091 821.6652 -1631.330 -1631.100 -1607.817 -4.593936+0i 0.114692+1.365411i, -1.273789-0.000000i, 0.114692-1.365411i
a104 0.0003923 827.7659 -1641.532 -1641.224 -1614.100 1.114879+0i 0.202209+1.31577i, 0.202209-1.31577i, 1.000001-0.00000i, -1.352297+0.00000i
a105 0.0003929 828.0239 -1640.048 -1639.651 -1608.697 1.103023+0i 1.000002-0.000000i, -1.312010+0.000000i, 0.146005-1.323116i, 0.146005+1.323116i, 9.340950+0.000000i
a200 0.0004586 801.7290 -1595.458 -1595.349 -1579.782 1.741604+0i, -6.053939+0i
a201 0.0004597 801.7718 -1593.544 -1593.380 -1573.949 1.707303+0i, -8.322029+0i 17.35635+0i
a202 0.0004424 808.6177 -1605.235 -1605.005 -1581.722 1.11252+1.235597i, 1.11252-1.235597i 0.478362+1.381186i, 0.478362-1.381186i
a203 0.0004020 824.8516 -1635.703 -1635.395 -1608.271 -0.603528+1.04516i, -0.603528-1.04516i -0.341680+1.04375i, -1.490094-0.00000i, -0.341680-1.04375i
a204 0.0003928 828.0543 -1640.109 -1639.712 -1608.757 1.103352+0i, -8.074288+0i 1.000000+0.00000i, -1.299607-0.00000i, 0.140388-1.33086i, 0.140388+1.33086i
a205 0.0003944 827.7675 -1637.535 -1637.038 -1602.265 1.114557+0i, -1.380078+0i 1.000000+0.000000i, -1.304935+0.000000i, 0.200885-1.317038i, 0.200885+1.317038i, -1.439098+0.000000i
a300 0.0004588 802.1164 -1594.233 -1594.069 -1574.638 1.561490-0.000000i, -1.559974+3.283859i, -1.559974-3.283859i
a301 0.0004395 809.7261 -1607.452 -1607.222 -1583.939 1.489169-0.000000i, -1.438632+0.982914i, -1.438632-0.982914i -1.421429+0i
a302 0.0003535 844.0552 -1674.110 -1673.803 -1646.678 -0.5774684+0.8165105i, -0.5774684-0.8165105i, 1.7298070-0.0000000i -0.5791569+0.8214568i, -0.5791569-0.8214568i
a303 0.0003668 840.4061 -1664.812 -1664.416 -1633.461 -0.5727277+0.8592709i, -0.5727277-0.8592709i, 1.7714851-0.0000000i -0.5205722+0.9760787i, -0.5205722-0.9760787i, -6.1372611-0.0000000i
a304 0.0003475 849.1941 -1680.388 -1679.891 -1645.118 -0.5791893+0.8188125i, -0.5791893-0.8188125i, 2.5938880+0.0000000i -0.6126397+0.8363729i, -0.6126397-0.8363729i, -0.2481793-1.7853598i, -0.2481793+1.7853598i
a305 0.0003467 849.2092 -1678.418 -1677.809 -1639.230 1.0685664-0.0000000i, -0.5815414+0.8262367i, -0.5815414-0.8262367i 1.0000003-0.000000i, -0.6708443+0.861622i, -0.6708443-0.861622i, -0.4789438-1.295557i, -0.4789438+1.295557i
a400 0.0003926 828.4866 -1644.973 -1644.743 -1621.460 1.074908+0.622598i, -0.815559+1.008846i, -0.815559-1.008846i, 1.074908-0.622598i
a401 0.0003936 828.5076 -1643.015 -1642.707 -1615.583 1.079994+0.6285086i, -0.818089+0.9991499i, -0.818089-0.9991499i, 1.079994-0.6285086i -39.24509+0i
a402 0.0003644 841.5138 -1667.028 -1666.631 -1635.677 -0.5762646+0.8613026i, -0.5762646-0.8613026i, 2.2627735-0.8408307i, 2.2627735+0.8408307i -0.5154592+0.9948076i, -0.5154592-0.9948076i
a403 0.0003546 846.4790 -1674.958 -1674.461 -1639.688 1.192965+0.5326171i, -0.576099+0.8530305i, -0.576099-0.8530305i, 1.192965-0.5326171i -0.5438287+0.9807507i, -0.5438287-0.9807507i, 1.5196159+0.0000000i
a404 0.0003349 854.9148 -1689.830 -1689.220 -1650.641 1.0067552+0.5651317i, -0.5776288+0.8165889i, -0.5776288-0.8165889i, 1.0067552-0.5651317i 1.2962822+0.9424546i, -0.5840364+0.8220893i, -0.5840364-0.8220893i, 1.2962822-0.9424546i
a405 0.0003282 859.1482 -1696.296 -1695.563 -1653.189 0.8764700+0.5283505i, -0.5780574+0.8170427i, -0.5780574-0.8170427i, 0.8764700-0.5283505i 0.8854442+0.6218356i, -0.5844324+0.8275889i, -0.5844324-0.8275889i, 0.8854442-0.6218356i, -3.5686492+0.0000000i
a500 0.0003936 828.5134 -1643.027 -1642.719 -1615.595 1.0816019+0.630746i, -0.8181991+0.996680i, -0.8181991-0.996680i, 1.0816019-0.630746i, 30.0866777+0.000000i
a501 0.0003943 828.6939 -1641.388 -1640.991 -1610.037 1.063879+0.614274i, -0.782909+1.024026i, -0.782909-1.024026i, 1.063879-0.614274i, -1.633662-0.000000i -1.791808+0i
a502 0.0003465 849.6644 -1681.329 -1680.832 -1646.059 1.3231535+0.8089297i, -0.5790055+0.8187047i, -0.5790055-0.8187047i, 1.3231535-0.8089297i, -1.8902606+0.0000000i -0.6099102+0.8427657i, -0.6099102-0.8427657i
a503 0.0003371 853.7740 -1687.548 -1686.939 -1648.359 1.1212944+0.5571492i, -0.5777218+0.8165442i, -0.5777218-0.8165442i, 1.1212944-0.5571492i, -2.8445945-0.0000000i -0.5853707+0.822848i, -0.5853707-0.822848i, 1.8347464-0.000000i
a504 0.0003174 863.4734 -1704.947 -1704.214 -1661.839 0.8683242+0.5235239i, -0.5775260+0.8164740i, -0.5775260-0.8164740i, 0.8683242-0.5235239i, 2.5819016+0.0000000i 0.8814016+0.5774634i, -0.5799805+0.8211971i, -0.5799805-0.8211971i, 0.8814016-0.5774634i
a505 0.0003172 864.3575 -1704.715 -1703.846 -1657.688 0.8686868+0.5163216i, -0.5776237+0.8164885i, -0.5776237-0.8164885i, 0.8686868-0.5163216i, 1.6856268-0.0000000i 0.8977736+0.5532624i, -0.5820940+0.8217810i, -0.5820940-0.8217810i, 0.8977736-0.5532624i, 4.7797944+0.0000000i
Show the code
model_resid |>
  select(a504) |>
  residuals() |>
  ACF() |> 
  autoplot(var = .resid)
Figure 12: Correlogram of the residuals from the ARMA model fitted to the residuals from the regression model
Show the code
model_resid |>
  select(a504) |>
  residuals() |>
  PACF() |> 
  autoplot(var = .resid)
Figure 13: Partial correlogram of the residuals from the ARMA model fitted to the residuals from the regression model

Homework Preview (5 min)

  • Review upcoming homework assignment
  • Clarify questions
Download Homework