Autoregressive Moving Average (ARMA) Models

Chapter 6: Lesson 2

Learning Outcomes

Define autoregressive moving average (ARMA) models

Write the equation for an ARMA(p,q) model
Express an ARMA model in terms of the backward shift operators for the AR and MA components
State facts about ARMA processes related to stationarity, invertibility, special cases, parsimony, and parameter redundancy
Use ACF and PACF plots to determine if an AR, MA or ARMA model is appropriate for a time series

Apply an iterative time series modeling process

Fit a regression model to capture trend and seasonality
Examine residual diagnostic plots to assess autocorrelation
Fit an ARMA model to the residuals if needed
Check the residuals of the ARMA model for white noise
Forecast the original series by combining the regression and ARMA model forecasts

Preparation

Read Sections 6.5.1 and 6.6-6.7

Learning Journal Exchange (10 min)

Review another student’s journal
What would you add to your learning journal after reading another student’s?
What would you recommend the other student add to their learning journal?
Sign the Learning Journal review sheet for your peer

Class Activity: Introduction to Autoregressive Moving Average (ARMA) Models (10 min)

Autoregressive (AR) Models

In Chapter 4, Lesson 3, we learned the definition of an AR model:

Definition of an Autoregressive (AR) Model

The time series ${x_{t}}$ is an autoregressive process of order $p$ , denoted as $A R (p)$ , if $x_{t} = α_{1} x_{t - 1} + α_{2} x_{t - 2} + α_{3} x_{t - 3} + \dots + α_{p - 1} x_{t - (p - 1)} + α_{p} x_{t - p} + w_{t} (4.15)$

where ${w_{t}}$ is white noise and the $α_{i}$ are the model parameters with $α_{p} \neq 0$ .

The $A R (p)$ model can be expressed as: $\underset{θ_{p} (B)}{\underset{⏟}{(1 - α_{1} B - α_{2} B^{2} - \dots - α_{p} B^{p})}} x_{t} = w_{t}$

Moving Average (MA) Models

The definition of an $M A (q)$ model is:

Definition of a Moving Average (MA) Model

We say that a time series ${x_{t}}$ is a moving average process of order $q$ , denoted as $M A (q)$ , if each term in the time series is a linear combination of the current white noise term and the $q$ most recent past white noise terms.

It is given as: $x_{t} = w_{t} + β_{1} w_{t - 1} + β_{2} w_{t - 2} + β_{3} w_{t - 3} + \dots + β_{q - 1} w_{t - (q - 1)} + β_{q} w_{t - q}$

where ${w_{t}}$ is white noise with zero mean and variance $σ_{w}^{2}$ , and the $β_{i}$ are the model parameters with $β_{q} \neq 0$ .

Written in terms of the backward shift operator, we have $x_{t} = \underset{ϕ_{q} (B)}{\underset{⏟}{(1 + β_{1} B + β_{2} B^{2} + β_{3} B^{3} + \dots + β_{q - 1} B^{q - 1} + β_{q} B^{q})}} w_{t}$

Putting the $A R$ and $M A$ models together, we get the $A R M A$ model.

Definition of an Autogregressive Moving Average (ARMA) Model

A time series ${x_{t}}$ follows an autoregressive moving average (ARMA) model of order $(p, q)$ , which we write as $A R M A (p, q)$ , if it can be written as:

$x_{t} = \underset{A R (p) model}{\underset{⏟}{α_{1} x_{t - 1} + α_{2} x_{t - 2} + α_{3} x_{t - 3} + \dots α_{p} x_{t - p}}} + \underset{M A (q) model}{\underset{⏟}{w_{t} + β_{1} w_{t - 1} + β_{2} w_{t - 2} + β_{3} w_{t - 3} + \dots + β_{q} w_{t - q}}}$ where ${w_{t}}$ is a white noise process.

We can write this as: $θ_{p} (B) x_{t} = ϕ_{q} (B) w_{t}$

Facts about ARMA Processes

The following facts are true for $A R M A (p, q)$ processes:

The ARMA process is stationary if all the roots of $θ_{p} (B)$ are greater than 1 in absolute value.
The ARMA process is invertible if all the roots of $ϕ_{q} (B)$ are greater than 1 in absolute value.
The special case $A R M A (p, 0)$ is the $A R (p)$ model.
The special case $A R M A (0, q)$ is the $M A (q)$ model.
An $A R M A$ model will usually require fewer parameters than a single $M A$ or $A R$ model. This is called parameter parsimony.
If $θ$ and $ϕ$ have a common factor, a stationary model can be simplified. This is called parameter redundancy. As an example, the model $(1 - \frac{1}{2} B) (1 - \frac{1}{3} B) x_{t} = (1 - \frac{1}{2} B) (1 - \frac{1}{4} B) w_{t}$ is the same as the model $(1 - \frac{1}{3} B) x_{t} = (1 - \frac{1}{4} B) w_{t}$

Comparison of AR and MA Models

ACF and PACF of an

A R (p)

Process

We can use the pacf and acf plots to assess if an $A R (p)$ or $M A (q)$ model is appropriate. For an $A R (p)$ or $M A (q)$ process, we observe the following:

	AR(p)	MA(q)	ARMA(p,q)
ACF	Tails off	Cuts off after lag $q$	Tails off
PACF	Cuts off after lag $p$	Tails off	Tails off

Analyzing Time Series with a Regular Seasonal Pattern

Here are some steps you can use to guide your work as you analyze time series with a regular seasonal pattern. Even though these are presented linearly, you may find it helpful to iterate between some of the steps as needed.

Analyzing a Time Series with a Regular Seasonal Pattern

Create a time plot of the series
Determine an appropriate model for the trend and seasonality; call this Model 1
- Regression
- Holt-Winters
- Additive / Multiplicative Decomposition
- Other Techniques
Use AIC/AICc/BIC (or other metrics) and your reasoning skills to choose the best model
Obtain the residuals
Check the residuals of Model 1 for evidence of autocorrelation and non-stationarity
- Time plot of the residuals
- ACF plot
- PACF plot
Fit an AR/MA/ARMA model to the (stationary) residuals; call this Model 2
Check the residuals of Model 2 for evidence of autocorrelation and normality
- ACF plot
- PACF plot
- Histogram
Forecast the trend and seasonality of the time series using Model 1
Forecast the residuals of Model 1 using Model 2
Add the two forecasts together
Plot and summarize the resulting forecast
Refine the models above as needed

Class Activity: Model for the Residuals from the Rexburg Weather Model (15 min)

Review

We now review the model we built in Chapter 5 Lesson 3 for the monthly average of the daily high temperature in Rexburg, Idaho.

Show the code

weather_df <- rio::import("https://byuistats.github.io/timeseries/data/rexburg_weather_monthly.csv") |>
  mutate(dates = my(date_text)) |>
  filter(dates >= my("1/2008") & dates <= my("12/2023")) |>
  rename(x = avg_daily_high_temp) |>
  mutate(TIME = 1:n()) |>
  mutate(
    cos1 = cos(2 * pi * 1 * TIME/12),
    cos2 = cos(2 * pi * 2 * TIME/12),
    cos3 = cos(2 * pi * 3 * TIME/12),
    cos4 = cos(2 * pi * 4 * TIME/12),
    cos5 = cos(2 * pi * 5 * TIME/12),
    cos6 = cos(2 * pi * 6 * TIME/12),
    sin1 = sin(2 * pi * 1 * TIME/12),
    sin2 = sin(2 * pi * 2 * TIME/12),
    sin3 = sin(2 * pi * 3 * TIME/12),
    sin4 = sin(2 * pi * 4 * TIME/12),
    sin5 = sin(2 * pi * 5 * TIME/12),
    sin6 = sin(2 * pi * 6 * TIME/12)) |>
  mutate(zTIME = (TIME - mean(TIME)) / sd(TIME)) |>
  as_tsibble(index = TIME)

weather_df |>
  as_tsibble(index = dates) |>
  autoplot(.vars = x) +
  geom_smooth(method = "lm", se = FALSE, color = "#F0E442") +
    labs(
      x = "Month",
      y = "Mean Daily High Temperature (Fahrenheit)",
      title = "Time Plot of Mean Daily Rexburg High Temperature by Month",
      subtitle = paste0("(", format(weather_df$dates %>% head(1), "%b %Y"), endash, format(weather_df$dates %>% tail(1), "%b %Y"), ")")
    ) +
    theme_minimal() +
    theme(
      plot.title = element_text(hjust = 0.5),
      plot.subtitle = element_text(hjust = 0.5)
    )

Figure 1: Time plot of the monthly mean daily high temperatures in Rexburg Idaho, in degrees Fahrenheit

We chose the “Reduced Linear 5” model. For convenience, we reprint the coefficients here.

Show the code

reduced5_linear_lm <- weather_df |>
  model(reduced_linear_5  = TSLM(x ~ zTIME + sin1 + cos1 + sin2 + cos2))

reduced5_linear_lm |>
  tidy() |>
  mutate(sig = p.value < 0.05)

# A tibble: 6 × 7
  .model           term        estimate std.error statistic   p.value sig  
  <chr>            <chr>          <dbl>     <dbl>     <dbl>     <dbl> <lgl>
1 reduced_linear_5 (Intercept)   56.5       0.279    202.   5.06e-220 TRUE 
2 reduced_linear_5 zTIME          0.692     0.280      2.47 1.44e-  2 TRUE 
3 reduced_linear_5 sin1         -16.1       0.395    -40.7  1.66e- 94 TRUE 
4 reduced_linear_5 cos1         -23.7       0.395    -60.1  9.40e-124 TRUE 
5 reduced_linear_5 sin2           1.26      0.395      3.20 1.60e-  3 TRUE 
6 reduced_linear_5 cos2          -2.86      0.395     -7.24 1.13e- 11 TRUE

Show the code

r5lin_coef_unrounded <- reduced5_linear_lm |>
  tidy() |>
  select(term, estimate, std.error)

r5lin_coef <- r5lin_coef_unrounded |>
  round_df(3)

stats_unrounded <- weather_df |>
  as_tibble() |>
  dplyr::select(TIME) |>
  summarize(mean = mean(TIME), sd = sd(TIME))

stats <- stats_unrounded |>
  round_df(3)

The fitted model is:

$\begin{aligned} x_{t} & = {\hat{β}}_{0} + {\hat{β}}_{1} (\frac{t - \bar{t}}{s_{t}}) \\ + {\hat{β}}_{2} \sin (\frac{2 π \cdot 1 t}{12}) + {\hat{β}}_{3} \cos (\frac{2 π \cdot 1 t}{12}) \\ + {\hat{β}}_{4} \sin (\frac{2 π \cdot 2 t}{12}) + {\hat{β}}_{5} \cos (\frac{2 π \cdot 2 t}{12}) \\ = 56.467 + 0.692 (\frac{t - 96.5}{55.57}) \\ + (- 16.068) \sin (\frac{2 π \cdot 1 t}{12}) + (- 23.708) \cos (\frac{2 π \cdot 1 t}{12}) \\ + 1.264 \sin (\frac{2 π \cdot 2 t}{12}) + (- 2.858) \cos (\frac{2 π \cdot 2 t}{12}) \end{aligned}$

Show the code

num_months <- weather_df |> 
  as_tibble() |> 
  dplyr::select(TIME) |> 
  tail(1) |> 
  pull()

df <- tibble( TIME = seq(1, num_months, 0.01) ) |>
  mutate(
    cos1 = cos(2 * pi * 1 * TIME/12),
    cos2 = cos(2 * pi * 2 * TIME/12),
    cos3 = cos(2 * pi * 3 * TIME/12),
    cos4 = cos(2 * pi * 4 * TIME/12),
    cos5 = cos(2 * pi * 5 * TIME/12),
    cos6 = cos(2 * pi * 6 * TIME/12),
    sin1 = sin(2 * pi * 1 * TIME/12),
    sin2 = sin(2 * pi * 2 * TIME/12),
    sin3 = sin(2 * pi * 3 * TIME/12),
    sin4 = sin(2 * pi * 4 * TIME/12),
    sin5 = sin(2 * pi * 5 * TIME/12),
    sin6 = sin(2 * pi * 6 * TIME/12)) |>
  mutate(zTIME = (TIME - mean(TIME)) / sd(TIME)) |>
  as_tsibble(index = TIME)

linear5_ts <- reduced5_linear_lm |>
  forecast(df) |>
  as_tibble() |>
  dplyr::select(TIME, .mean) |>
  rename(value = .mean) |>
  mutate(Model = "Linear 5")

data_ts <- weather_df |> 
  as_tibble() |>
  rename(value = x) |>
  mutate(Model = "Data") |>
  dplyr::select(TIME, value, Model)

combined_ts <- bind_rows(data_ts, linear5_ts) 
point_ts <- combined_ts |> filter(TIME == floor(TIME))

okabe_ito_colors <- c("#000000", "#E69F00")

combined_ts |>
  ggplot(aes(x = TIME, y = value, color = Model)) +
  geom_line() +
  geom_point(data = point_ts, alpha = 0.5) +
  labs(
      x = "Month Number",
      y = "Temperature (Fahrenheit)",
      title = "Monthly Average of Daily High Temperatures in Rexburg",
      subtitle = paste0("(", format(weather_df$dates %>% head(1), "%b %Y"), endash, format(weather_df$dates %>% tail(1), "%b %Y"), ")")
  ) +    
  scale_color_manual(
    values = okabe_ito_colors[1:nrow(combined_ts |> as_tibble() |> select(Model) |> unique())], 
    name = ""
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5),
    legend.position = "top", # Position the legend at the top
    legend.direction = "horizontal" # Set the legend direction to horizontal
  )

Figure 2: Time plot of the monthly mean daily high temperatures in Rexburg Idaho, in degrees Fahrenheit; the fitted values from the regression model are given in orange

Fitting an ARMA(p,q) Model

First, we create an acf and pacf plot of the residuals from the model above.

Show the code

reduced5_linear_lm |>
  residuals() |>
  ACF() |> 
  autoplot(var = .resid)

Figure 3: ACF Plot of the Residuals from the Rexburg Weather Model

Show the code

reduced5_linear_lm |> 
  residuals() |>
  PACF() |>
  autoplot(var = .resid)

Figure 4: PACF Plot of the Residuals from the Rexburg Weather Model

Check Your Understanding

What ARMA model does the combined information in the acf and pacf plots suggest would be appropriate for the residuals of the Rexburg weather model?

Here are summaries of some ARMA models that could be constructed.

model_resid <- reduced5_linear_lm |>
  residuals() |>
  select(-.model) |>
  model(
    auto = ARIMA(.resid ~ 1 + pdq(0:2,0,0:2) + PDQ(0, 0, 0)),
    a000 = ARIMA(.resid ~ 1 + pdq(0,0,0) + PDQ(0, 0, 0)),
    a001 = ARIMA(.resid ~ 1 + pdq(0,0,1) + PDQ(0, 0, 0)),
    a002 = ARIMA(.resid ~ 1 + pdq(0,0,2) + PDQ(0, 0, 0)),
    a100 = ARIMA(.resid ~ 1 + pdq(1,0,0) + PDQ(0, 0, 0)),
    a101 = ARIMA(.resid ~ 1 + pdq(1,0,1) + PDQ(0, 0, 0)),
    a102 = ARIMA(.resid ~ 1 + pdq(1,0,2) + PDQ(0, 0, 0)),
    a200 = ARIMA(.resid ~ 1 + pdq(2,0,0) + PDQ(0, 0, 0)),
    a201 = ARIMA(.resid ~ 1 + pdq(2,0,1) + PDQ(0, 0, 0)),
    a202 = ARIMA(.resid ~ 1 + pdq(2,0,2) + PDQ(0, 0, 0))) 

model_resid |>
  glance()

Table 1: Comparison of the AIC, AICc, and BIC values for the models fitted to the Rexburg weather data

Model	AIC	AICc	BIC
auto	1056	1056.1	1065.7
a000	1062.1	1062.1	1068.6
a001	1056.9	1057	1066.7
a002	1057.9	1058.1	1071
a100	1056	1056.1	1065.7
a101	1057.2	1057.4	1070.2
a102	1059.1	1059.4	1075.4
a200	1057.5	1057.7	1070.5
a201	1059.1	1059.5	1075.4
a202	1059.6	1060	1079.1

Show the code

model_resid |>
  tidy() |>
  filter(.model == "a101")

# A tibble: 3 × 6
  .model term     estimate std.error statistic p.value
  <chr>  <chr>       <dbl>     <dbl>     <dbl>   <dbl>
1 a101   ar1       0.528       0.262    2.02    0.0451
2 a101   ma1      -0.341       0.288   -1.18    0.238 
3 a101   constant  0.00236     0.176    0.0134  0.989

Show the code

model_resid |>
  select(a101) |>
  residuals() |>
  ACF() |>
  autoplot(var = .resid)

Figure 5: ACF plot of the residuals from the $A R M A (1, 1)$ model

Show the code

model_resid |>
  select(a101) |>
  residuals() |>
  PACF() |>
  autoplot(var = .resid)

Figure 6: PACF plot of the residuals from the $A R M A (1, 1)$ model

Small-Group Activity: Industrial Electricity Consumption in Texas (20 min)

These data represent the amount of electricity used each month for industrial applications in Texas.

Show the code

elec_ts <- rio::import("https://byuistats.github.io/timeseries/data/electricity_tx.csv") |>
  dplyr::select(-comments) |>
  mutate(month = my(month)) |>
  mutate(
    t = 1:n(),
    std_t = (t - mean(t)) / sd(t)
  ) |>
  mutate(
    cos1 = cos(2 * pi * 1 * t / 12),
    cos2 = cos(2 * pi * 2 * t / 12),
    cos3 = cos(2 * pi * 3 * t / 12),
    cos4 = cos(2 * pi * 4 * t / 12),
    cos5 = cos(2 * pi * 5 * t / 12),
    cos6 = cos(2 * pi * 6 * t / 12),
    sin1 = sin(2 * pi * 1 * t / 12),
    sin2 = sin(2 * pi * 2 * t / 12),
    sin3 = sin(2 * pi * 3 * t / 12),
    sin4 = sin(2 * pi * 4 * t / 12),
    sin5 = sin(2 * pi * 5 * t / 12)
  ) |>
  as_tsibble(index = month)

elec_plot_raw <- elec_ts |>
    autoplot(.vars = megawatthours) +
    labs(
      x = "Month",
      y = "Megawatt-hours",
      title = "Texas' Industrial Electricity Use"
    ) +
    theme_minimal() +
    theme(
      plot.title = element_text(hjust = 0.5)
    )

elec_plot_log <- elec_ts |>
    autoplot(.vars = log(megawatthours)) +
    labs(
      x = "Month",
      y = "log(Megwatt-hours)",
      title = "Log of Texas' Industrial Electricity Use"
    ) +
    theme_minimal() +
    theme(
      plot.title = element_text(hjust = 0.5)
    )

elec_plot_raw | elec_plot_log

Check Your Understanding

Use the Texas industrial electricity consumption data to do the following.

Select an appropriate model to fit the time series using the AIC, AICc, or BIC critera.
Determine the best $A R M A (p, q)$ model for the residuals.

Small Group Activity: Fitting an ARMA(p,q) Model to the Retail Data (30 min)

Apply what you have learned in this lesson to the retail sales data. In particular, consider full-service restaurants (NAICS code 722511).

Check Your Understanding

Use the total retail sales for full-service restaurants to do the following.

Select an appropriate model to fit the time series using the AIC, AICc, or BIC critera.
Determine the best $A R M A (p, q)$ model for the residuals.
Forecast the data for the next 5 years using the regression model.
Forecast the residuals for the next 5 years using ARMA model.
Sum the two forecasted time series and plot the combined series against the original retail sales data.

Here is some code you can use to help you get started.

Show the code

# Read in retail sales data for "Full-Service Restaurants"
retail_ts <- rio::import("https://byuistats.github.io/timeseries/data/retail_by_business_type.parquet") |>
  filter(naics == 722511) |>
  mutate(
    month = yearmonth(as_date(month)),
    month_number = month(month)
  ) |>
  mutate(t = 1:n()) |>
  mutate(std_t = (t - mean(t)) / sd(t)) |>
  mutate(
    cos1 = cos(2 * pi * 1 * t / 12),
    cos2 = cos(2 * pi * 2 * t / 12),
    cos3 = cos(2 * pi * 3 * t / 12),
    cos4 = cos(2 * pi * 4 * t / 12),
    cos5 = cos(2 * pi * 5 * t / 12),
    cos6 = cos(2 * pi * 6 * t / 12),
    sin1 = sin(2 * pi * 1 * t / 12),
    sin2 = sin(2 * pi * 2 * t / 12),
    sin3 = sin(2 * pi * 3 * t / 12),
    sin4 = sin(2 * pi * 4 * t / 12),
    sin5 = sin(2 * pi * 5 * t / 12)
  ) |>
  mutate(
    in_great_recession = ifelse(ym(month) >= my("December 2007") & month <= my("June 2009"), 1, 0),
    after_great_recession = ifelse(ym(month) > my("June 2009"), 1, 0)
  ) |>
  as_tsibble(index = month)

retail_plot_raw <- retail_ts |>
    autoplot(.vars = sales_millions) +
    labs(
      x = "Month",
      y = "sales_millions",
      title = "Other General Merchandise Sales"
    ) +
    theme_minimal() +
    theme(
      plot.title = element_text(hjust = 0.5)
    )

retail_plot_log <- retail_ts |>
    autoplot(.vars = log(sales_millions)) +
    labs(
      x = "Month",
      y = "log(sales_millions)",
      title = "Logarithm of Other Gen. Merch. Sales"
    ) +
    theme_minimal() +
    theme(
      plot.title = element_text(hjust = 0.5)
    )

retail_plot_raw | retail_plot_log

Figure 7: Time plot of the time series (left) and the natural logarithm of the time series (right)

Show the code

retail_final_lm <- retail_ts |>
  model(retail_full = TSLM(log(sales_millions) ~  
      (in_great_recession * std_t) + (after_great_recession * std_t) 
      + (in_great_recession * I(std_t^2) ) + (after_great_recession * I(std_t^2) )
      + (in_great_recession * I(std_t^3) ) + (after_great_recession * I(std_t^3) )
      + sin1 + cos1 + sin2 + cos2 + sin3 + cos3
      + sin4 + cos4 + sin5 + cos5 + cos6 # Note sin6 is omitted
    )
    ) 

retail_final_lm |>
  tidy() |>
  mutate(sig = p.value < 0.05) |>
  knitr::kable(format = "html", align='ccccccc', escape = FALSE, width = NA, row.names = FALSE) |>
  kable_styling(full_width = FALSE, "striped")

Table 2: Coefficients for the fitted model for the total sales in full-service restaurants in the United States

.model	term	estimate	std.error	statistic	p.value	sig
retail_full	(Intercept)	9.6959936	0.0062100	1561.3586384	0.0000000	TRUE
retail_full	in_great_recession	0.1057700	0.1187801	0.8904693	0.3738957	FALSE
retail_full	std_t	0.4905760	0.0326870	15.0083105	0.0000000	TRUE
retail_full	after_great_recession	-0.1556298	0.0312565	-4.9791109	0.0000011	TRUE
retail_full	I(std_t^2)	-0.0163208	0.0458329	-0.3560935	0.7220097	FALSE
retail_full	I(std_t^3)	-0.0102056	0.0178717	-0.5710498	0.5683744	FALSE
retail_full	sin1	-0.0000252	0.0020362	-0.0123921	0.9901207	FALSE
retail_full	cos1	-0.0360436	0.0020595	-17.5008207	0.0000000	TRUE
retail_full	sin2	-0.0037502	0.0020154	-1.8607818	0.0637096	FALSE
retail_full	cos2	0.0078111	0.0020169	3.8728011	0.0001309	TRUE
retail_full	sin3	-0.0116397	0.0020098	-5.7913822	0.0000000	TRUE
retail_full	cos3	0.0213598	0.0020189	10.5801523	0.0000000	TRUE
retail_full	sin4	0.0001229	0.0020100	0.0611468	0.9512812	FALSE
retail_full	cos4	0.0101346	0.0020145	5.0308564	0.0000008	TRUE
retail_full	sin5	0.0277666	0.0020122	13.7990897	0.0000000	TRUE
retail_full	cos5	0.0203802	0.0020132	10.1232890	0.0000000	TRUE
retail_full	cos6	0.0038084	0.0014225	2.6772474	0.0078131	TRUE
retail_full	in_great_recession:std_t	-2.5718611	3.0873172	-0.8330408	0.4054549	FALSE
retail_full	std_t:after_great_recession	0.0198043	0.1413695	0.1400891	0.8886794	FALSE
retail_full	in_great_recession:I(std_t^2)	14.8230840	24.4731229	0.6056883	0.5451593	FALSE
retail_full	after_great_recession:I(std_t^2)	0.1215930	0.1888445	0.6438792	0.5201238	FALSE
retail_full	in_great_recession:I(std_t^3)	-35.1756536	60.1509944	-0.5847892	0.5591094	FALSE
retail_full	after_great_recession:I(std_t^3)	-0.0653848	0.0766366	-0.8531788	0.3942105	FALSE

Show the code

retail_resid_df <- retail_final_lm |> 
  residuals() |> 
  as_tibble() |> 
  dplyr::select(.resid) |>
  rename(x = .resid) 
  
retail_resid_df |>
  mutate(density = dnorm(x, mean(retail_resid_df$x), sd(retail_resid_df$x))) |>
  ggplot(aes(x = x)) +
    geom_histogram(aes(y = after_stat(density)),
        color = "white", fill = "#56B4E9", binwidth = 0.02) +
    geom_line(aes(x = x, y = density)) +
    theme_bw() +
    labs(
      x = "Values",
      y = "Frequency",
      title = "Histogram of Residuals"
    ) +
    theme(
      plot.title = element_text(hjust = 0.5)
    )

Figure 8: Histogram of the residuals from the fitted regression model

The skewness is given by the command skewness:

Show the code

skewness(retail_resid_df$x)

[1] 0.01014588

Show the code

retail_final_lm |>
  residuals() |>
  ACF() |> 
  autoplot(var = .resid)

Figure 9: Correlogram of the residuals from the fitted regression model

Show the code

retail_final_lm |>
  residuals() |>
  PACF() |> 
  autoplot(var = .resid)

Figure 10: Partial correlogram of the residuals from the fitted regression model

Show the code

################################################################
#   Be sure to adjust this model to match what you did above   #
################################################################
#
# Model without any seasonal trends
retail_no_seasonal_lm <- retail_ts |>
  model(retail_full_quad = TSLM(log(sales_millions) ~  
      (in_great_recession * std_t) + (after_great_recession * std_t) 
      + (in_great_recession * I(std_t^2) ) + (after_great_recession * I(std_t^2) )
      + (in_great_recession * I(std_t^3) ) + (after_great_recession * I(std_t^3) )
    )
    )

retail_ts |> 
  autoplot(.vars = sales_millions) +
  # ggplot(mapping = aes(x = month, y = sales_millions)) +
  # geom_line() +
  geom_line(data = augment(retail_no_seasonal_lm),
            aes(x = month, y = .fitted),
            color = "#E69F00",
            linewidth = 1
            ) +
  geom_line(data = augment(retail_final_lm),
            aes(x = month, y = .fitted),
            color = "blue",
            alpha = 0.7,
            # linewidth = 1
            ) +
  labs(
    x = "Month",
    y = "Sales (Millions of U.S. Dollars)",
    title = paste0(retail_ts$business[1], " (", retail_ts$naics[1], ")")
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

Figure 11: Time plot of the retail sales data for full-service restaurants; the regression curve without the seasonal terms is given in orange; the predicted values from the model are given in blue

Show the code

model_resid <- retail_final_lm |>
  residuals() |>
  select(-.model) |>
  model(
    auto = ARIMA(.resid ~ 1 + pdq(0:5,0,0:5) + PDQ(0, 0, 0)),
    a000 = ARIMA(.resid ~ 1 + pdq(0,0,0) + PDQ(0, 0, 0)),
    a001 = ARIMA(.resid ~ 1 + pdq(0,0,1) + PDQ(0, 0, 0)),
    a002 = ARIMA(.resid ~ 1 + pdq(0,0,2) + PDQ(0, 0, 0)),
    a003 = ARIMA(.resid ~ 1 + pdq(0,0,3) + PDQ(0, 0, 0)),
    a004 = ARIMA(.resid ~ 1 + pdq(0,0,4) + PDQ(0, 0, 0)),
    a005 = ARIMA(.resid ~ 1 + pdq(0,0,5) + PDQ(0, 0, 0)),
    a100 = ARIMA(.resid ~ 1 + pdq(1,0,0) + PDQ(0, 0, 0)),
    a101 = ARIMA(.resid ~ 1 + pdq(1,0,1) + PDQ(0, 0, 0)),
    a102 = ARIMA(.resid ~ 1 + pdq(1,0,2) + PDQ(0, 0, 0)),
    a103 = ARIMA(.resid ~ 1 + pdq(1,0,3) + PDQ(0, 0, 0)),
    a104 = ARIMA(.resid ~ 1 + pdq(1,0,4) + PDQ(0, 0, 0)),
    a105 = ARIMA(.resid ~ 1 + pdq(1,0,5) + PDQ(0, 0, 0)), 
    a200 = ARIMA(.resid ~ 1 + pdq(2,0,0) + PDQ(0, 0, 0)),
    a201 = ARIMA(.resid ~ 1 + pdq(2,0,1) + PDQ(0, 0, 0)),
    a202 = ARIMA(.resid ~ 1 + pdq(2,0,2) + PDQ(0, 0, 0)),
    a203 = ARIMA(.resid ~ 1 + pdq(2,0,3) + PDQ(0, 0, 0)),
    a204 = ARIMA(.resid ~ 1 + pdq(2,0,4) + PDQ(0, 0, 0)),
    a205 = ARIMA(.resid ~ 1 + pdq(2,0,5) + PDQ(0, 0, 0)),
    a300 = ARIMA(.resid ~ 1 + pdq(3,0,0) + PDQ(0, 0, 0)),
    a301 = ARIMA(.resid ~ 1 + pdq(3,0,1) + PDQ(0, 0, 0)),
    a302 = ARIMA(.resid ~ 1 + pdq(3,0,2) + PDQ(0, 0, 0)),
    a303 = ARIMA(.resid ~ 1 + pdq(3,0,3) + PDQ(0, 0, 0)),
    a304 = ARIMA(.resid ~ 1 + pdq(3,0,4) + PDQ(0, 0, 0)),
    a305 = ARIMA(.resid ~ 1 + pdq(3,0,5) + PDQ(0, 0, 0)), 
    a400 = ARIMA(.resid ~ 1 + pdq(4,0,0) + PDQ(0, 0, 0)),
    a401 = ARIMA(.resid ~ 1 + pdq(4,0,1) + PDQ(0, 0, 0)),
    a402 = ARIMA(.resid ~ 1 + pdq(4,0,2) + PDQ(0, 0, 0)),
    a403 = ARIMA(.resid ~ 1 + pdq(4,0,3) + PDQ(0, 0, 0)),
    a404 = ARIMA(.resid ~ 1 + pdq(4,0,4) + PDQ(0, 0, 0)),
    a405 = ARIMA(.resid ~ 1 + pdq(4,0,5) + PDQ(0, 0, 0)),
    a500 = ARIMA(.resid ~ 1 + pdq(5,0,0) + PDQ(0, 0, 0)),
    a501 = ARIMA(.resid ~ 1 + pdq(5,0,1) + PDQ(0, 0, 0)),
    a502 = ARIMA(.resid ~ 1 + pdq(5,0,2) + PDQ(0, 0, 0)),
    a503 = ARIMA(.resid ~ 1 + pdq(5,0,3) + PDQ(0, 0, 0)),
    a504 = ARIMA(.resid ~ 1 + pdq(5,0,4) + PDQ(0, 0, 0)),
    a505 = ARIMA(.resid ~ 1 + pdq(5,0,5) + PDQ(0, 0, 0))
  )


# This command indicates which model was selected by "auto"
# model_resid |> select(auto)

model_resid |> 
  glance() |>
  # uncomment the next line to only show the "best" models according to these criteria
  # filter(AIC == min(AIC) | AICc == min(AICc) | BIC == min(BIC) | .model == "auto") |>
  knitr::kable(format = "html", align='ccccccc', escape = FALSE, width = NA, row.names = FALSE) |>
  kable_styling(full_width = FALSE, "striped")

Table 3: Results from the process of fitting an ARMA model to the residuals of the model for the total sales in full-service restaurants in the United States

.model	sigma2	log_lik	AIC	AICc	BIC	ar_roots	ma_roots
auto	0.0003644	841.5138	-1667.028	-1666.631	-1635.677	-0.5762646+0.8613026i, -0.5762646-0.8613026i, 2.2627735-0.8408307i, 2.2627735+0.8408307i	-0.5154592+0.9948076i, -0.5154592-0.9948076i
a000	0.0005764	762.4127	-1520.825	-1520.793	-1512.988
a001	0.0004803	793.5159	-1581.032	-1580.966	-1569.275		-2.382155+0i
a002	0.0004782	794.7146	-1581.429	-1581.320	-1565.754		-1.822802+2.494324i, -1.822802-2.494324i
a003	0.0004109	820.4694	-1630.939	-1630.775	-1611.344		0.232624+1.346072i, -1.379822+0.000000i, 0.232624-1.346072i
a004	0.0004086	821.8842	-1631.768	-1631.538	-1608.255		0.090446+1.334121i, -1.288012-0.000000i, 0.090446-1.334121i, 4.263380+0.000000i
a005	0.0004097	821.8871	-1629.774	-1629.466	-1602.342		0.096742+1.327683i, -1.295555-0.000000i, 0.096742-1.327683i, 3.658704-0.000000i, -19.273408-0.000000i
a100	0.0004615	800.2336	-1594.467	-1594.402	-1582.710	2.220227+0i
a101	0.0004590	801.5727	-1595.145	-1595.036	-1579.470	1.709493+0i	6.008737+0i
a102	0.0004537	803.9495	-1597.899	-1597.735	-1578.305	2.092518+0i	0.291385+2.033057i, 0.291385-2.033057i
a103	0.0004091	821.6652	-1631.330	-1631.100	-1607.817	-4.593936+0i	0.114692+1.365411i, -1.273789-0.000000i, 0.114692-1.365411i
a104	0.0003923	827.7659	-1641.532	-1641.224	-1614.100	1.114879+0i	0.202209+1.31577i, 0.202209-1.31577i, 1.000001-0.00000i, -1.352297+0.00000i
a105	0.0003929	828.0239	-1640.048	-1639.651	-1608.697	1.103023+0i	1.000002-0.000000i, -1.312010+0.000000i, 0.146005-1.323116i, 0.146005+1.323116i, 9.340950+0.000000i
a200	0.0004586	801.7290	-1595.458	-1595.349	-1579.782	1.741604+0i, -6.053939+0i
a201	0.0004597	801.7718	-1593.544	-1593.380	-1573.949	1.707303+0i, -8.322029+0i	17.35635+0i
a202	0.0004424	808.6177	-1605.235	-1605.005	-1581.722	1.11252+1.235597i, 1.11252-1.235597i	0.478362+1.381186i, 0.478362-1.381186i
a203	0.0004020	824.8516	-1635.703	-1635.395	-1608.271	-0.603528+1.04516i, -0.603528-1.04516i	-0.341680+1.04375i, -1.490094-0.00000i, -0.341680-1.04375i
a204	0.0003928	828.0543	-1640.109	-1639.712	-1608.757	1.103352+0i, -8.074288+0i	1.000000+0.00000i, -1.299607-0.00000i, 0.140388-1.33086i, 0.140388+1.33086i
a205	0.0003944	827.7675	-1637.535	-1637.038	-1602.265	1.114557+0i, -1.380078+0i	1.000000+0.000000i, -1.304935+0.000000i, 0.200885-1.317038i, 0.200885+1.317038i, -1.439098+0.000000i
a300	0.0004588	802.1164	-1594.233	-1594.069	-1574.638	1.561490-0.000000i, -1.559974+3.283859i, -1.559974-3.283859i
a301	0.0004395	809.7261	-1607.452	-1607.222	-1583.939	1.489169-0.000000i, -1.438632+0.982914i, -1.438632-0.982914i	-1.421429+0i
a302	0.0003535	844.0552	-1674.110	-1673.803	-1646.678	-0.5774684+0.8165105i, -0.5774684-0.8165105i, 1.7298070-0.0000000i	-0.5791569+0.8214568i, -0.5791569-0.8214568i
a303	0.0003668	840.4061	-1664.812	-1664.416	-1633.461	-0.5727277+0.8592709i, -0.5727277-0.8592709i, 1.7714851-0.0000000i	-0.5205722+0.9760787i, -0.5205722-0.9760787i, -6.1372611-0.0000000i
a304	0.0003475	849.1941	-1680.388	-1679.891	-1645.118	-0.5791893+0.8188125i, -0.5791893-0.8188125i, 2.5938880+0.0000000i	-0.6126397+0.8363729i, -0.6126397-0.8363729i, -0.2481793-1.7853598i, -0.2481793+1.7853598i
a305	0.0003467	849.2092	-1678.418	-1677.809	-1639.230	1.0685664-0.0000000i, -0.5815414+0.8262367i, -0.5815414-0.8262367i	1.0000003-0.000000i, -0.6708443+0.861622i, -0.6708443-0.861622i, -0.4789438-1.295557i, -0.4789438+1.295557i
a400	0.0003926	828.4866	-1644.973	-1644.743	-1621.460	1.074908+0.622598i, -0.815559+1.008846i, -0.815559-1.008846i, 1.074908-0.622598i
a401	0.0003936	828.5076	-1643.015	-1642.707	-1615.583	1.079994+0.6285086i, -0.818089+0.9991499i, -0.818089-0.9991499i, 1.079994-0.6285086i	-39.24509+0i
a402	0.0003644	841.5138	-1667.028	-1666.631	-1635.677	-0.5762646+0.8613026i, -0.5762646-0.8613026i, 2.2627735-0.8408307i, 2.2627735+0.8408307i	-0.5154592+0.9948076i, -0.5154592-0.9948076i
a403	0.0003546	846.4790	-1674.958	-1674.461	-1639.688	1.192965+0.5326171i, -0.576099+0.8530305i, -0.576099-0.8530305i, 1.192965-0.5326171i	-0.5438287+0.9807507i, -0.5438287-0.9807507i, 1.5196159+0.0000000i
a404	0.0003349	854.9148	-1689.830	-1689.220	-1650.641	1.0067552+0.5651317i, -0.5776288+0.8165889i, -0.5776288-0.8165889i, 1.0067552-0.5651317i	1.2962822+0.9424546i, -0.5840364+0.8220893i, -0.5840364-0.8220893i, 1.2962822-0.9424546i
a405	0.0003282	859.1482	-1696.296	-1695.563	-1653.189	0.8764700+0.5283505i, -0.5780574+0.8170427i, -0.5780574-0.8170427i, 0.8764700-0.5283505i	0.8854442+0.6218356i, -0.5844324+0.8275889i, -0.5844324-0.8275889i, 0.8854442-0.6218356i, -3.5686492+0.0000000i
a500	0.0003936	828.5134	-1643.027	-1642.719	-1615.595	1.0816019+0.630746i, -0.8181991+0.996680i, -0.8181991-0.996680i, 1.0816019-0.630746i, 30.0866777+0.000000i
a501	0.0003943	828.6939	-1641.388	-1640.991	-1610.037	1.063879+0.614274i, -0.782909+1.024026i, -0.782909-1.024026i, 1.063879-0.614274i, -1.633662-0.000000i	-1.791808+0i
a502	0.0003465	849.6644	-1681.329	-1680.832	-1646.059	1.3231535+0.8089297i, -0.5790055+0.8187047i, -0.5790055-0.8187047i, 1.3231535-0.8089297i, -1.8902606+0.0000000i	-0.6099102+0.8427657i, -0.6099102-0.8427657i
a503	0.0003371	853.7740	-1687.548	-1686.939	-1648.359	1.1212944+0.5571492i, -0.5777218+0.8165442i, -0.5777218-0.8165442i, 1.1212944-0.5571492i, -2.8445945-0.0000000i	-0.5853707+0.822848i, -0.5853707-0.822848i, 1.8347464-0.000000i
a504	0.0003174	863.4734	-1704.947	-1704.214	-1661.839	0.8683242+0.5235239i, -0.5775260+0.8164740i, -0.5775260-0.8164740i, 0.8683242-0.5235239i, 2.5819016+0.0000000i	0.8814016+0.5774634i, -0.5799805+0.8211971i, -0.5799805-0.8211971i, 0.8814016-0.5774634i
a505	0.0003172	864.3575	-1704.715	-1703.846	-1657.688	0.8686868+0.5163216i, -0.5776237+0.8164885i, -0.5776237-0.8164885i, 0.8686868-0.5163216i, 1.6856268-0.0000000i	0.8977736+0.5532624i, -0.5820940+0.8217810i, -0.5820940-0.8217810i, 0.8977736-0.5532624i, 4.7797944+0.0000000i

Show the code

model_resid |>
  select(a504) |>
  residuals() |>
  ACF() |> 
  autoplot(var = .resid)

Figure 12: Correlogram of the residuals from the ARMA model fitted to the residuals from the regression model

Show the code

model_resid |>
  select(a504) |>
  residuals() |>
  PACF() |> 
  autoplot(var = .resid)

Figure 13: Partial correlogram of the residuals from the ARMA model fitted to the residuals from the regression model

Homework Preview (5 min)

Review upcoming homework assignment
Clarify questions

Download Homework

homework_6_2.qmd