What would you add to your learning journal after reading another student’s?
What would you recommend the other student add to their learning journal?
Sign the Learning Journal review sheet for your peer
Class Activity: Non-seasonal ARIMA Models (15 min)
Effect of Differencing
In Chapter 4 Lesson 2, we found that if we compute the first difference of the price of McDonald’s stock from July 2020 through December 2023, the differences can be modeled as white noise. Sometimes differencing can remove trends.
Consider the case of a random walk and a linear trend with white noise errors
Random Walk
Check Your Understanding
Consider the random walk
\[
x_t = x_{t-1} + w_t
\]
where \(\{w_t\}\) is a white noise process.
What is the model for the first differences of this time series?
Linear Trend with White Noise Errors
Check Your Understanding
Consider a time series with a linear trend and white noise errors.
\[
x_t = a + bt + w_t
\]
where \(\{w_t\}\) is a white noise process.
What is the model for the first differences of this time series?
What is the model obtained by subtracting \(a+bt\) from this series?
What are some potential concerns of using differencing to eliminate a deterministic trend?
Fitting an ARIMA Model when the Difference Model has a Non-Zero Mean
(See the last sentence in the first paragraph on page 138.)
Differencing a Time Series or the Logarithm of a Time Series
If the difference of a time series demonstrates an increasing trend, taking the logarithm before differencing can eliminate the increasing variation in the differences. As an example, consider the Australian electricity production series given in the book.
Definition of an Integrated Series of Order \(d\), \(I(d)\)
We say that a time series is integrated of order d if the \(d^{th}\) difference of \(\{x_t\}\) is a white noise process \(\{w_t\}\). Expressed differently, we write this as \({\nabla^d x_t = w_t}\). We denote an integrated time series of order \(d\) as \(I(d)\).
Recall that \(\nabla^d \equiv \left( 1 - \mathbf{B} \right)^d\). So, either of the following can be used to indicate an integrated time series of order \(d\):
A linear trend can be removed by first-order differencing. A curved trend can sometimes be eliminated by second order differencing.
In some cases, a lagged difference is more appropriate. For example, if you have monthly data and need to remove additive seasonal effects, you may want to take a difference with a lag of 12. This subtracts sequential January observations from each other. This models the year-over-year growth.
A time series is said to follow an \(ARIMA(p,d,q)\) process if the \(d^{th}\) differences of the time series follow an \(ARMA(p,q)\) process.
Suppose we let \(y_t = \left( 1 - \mathbf{B} \right)^d x_t\). The series \(\{y_t\}\) follows an \(ARMA(p,q)\) process if \(\theta_p \left(\mathbf{B} \right) y_t = \phi_q \left(\mathbf{B} \right)w_t\).
Substituting, we find that \(\{x_t\}\) follows an \(ARIMA(p,d,q)\) process if
This is an ARIMA(1,1,1) process with parameters \(\alpha = 0.5\) and \(\beta = 0.3\).
Check Your Understanding
Modify the code above to simulate from an \(ARIMA(2,1,2)\) process with parameters \(\alpha_1 = 0.5\), \(\alpha_2 = 0.2\), \(\beta_1 = 0.4\), and \(\beta_2 = 0.1\).
Class Activity: Fitting an ARIMA Process - Exchange Rates (10 min)
The data file exchange_rates.parquet gives the exchange rates for foreign currencies. The daily-observed values in the time series are the amount in the foreign currency equivalent to one U. S. dollar. We will consider the exchange rates to convert one dollar into Euros.
Here is one way to determine which model is selected by the “auto” process.
Show the code
exchange_model |>select(auto)
# A mable: 1 x 1
auto
<model>
1 <ARIMA(1,1,1) w/ drift>
We now examine all the fitted models to determine the value of the residual mean squared error (sigma2), log-likelihood, AIC, AICc, and BIC. For the log-likelihood, larger values are preferable. For all other measures, smaller values are preferred.
Show the code
exchange_model |>glance()
Table 2: Values used in the model selection process for the time series representing the exchange rate to convert US$1 into Euros
Model
sigma2
log_lik
AIC
AICc
BIC
auto
0
1485.2
-2962.5
-2962.3
-2947.5
a000
0
979.2
-1954.4
-1954.3
-1946.9
a001
0
1176.6
-2347.1
-2347.1
-2335.9
a002
0
1296.1
-2584.2
-2584.1
-2569.2
a100
0
1469.5
-2932.9
-2932.9
-2921.7
a101
0
1490.6
**-2973.2**
**-2973.1**
**-2958.2**
a102
**0**
1491.5
-2973
-2972.8
-2954.2
a200
0
1484.2
-2960.4
-2960.3
-2945.4
a201
0
1491.3
-2972.7
-2972.5
-2953.9
a202
0
**1491.6**
-2971.2
-2970.9
-2948.7
a011
0
1484.2
-2962.3
-2962.2
-2951.1
a012
0
1485.6
-2963.1
-2963
-2948.1
a110
0
1477.6
-2949.1
-2949
-2937.9
a111
0
1485.2
-2962.5
-2962.3
-2947.5
a112
0
1485.9
-2961.7
-2961.5
-2943
a210
0
1485.1
-2962.2
-2962
-2947.2
a211
0
1485.9
-2961.8
-2961.6
-2943.1
a212
0
1486
-2959.9
-2959.6
-2937.4
Suppose we choose to apply the “auto” model, which is \(ARIMA(1,1,1)\). The model parameters are summarized here:
Show the code
exchange_model |>select(auto) |>coefficients()
.model
term
estimate
std.error
statistic
p.value
auto
ar1
-0.1898907
0.1302066
-1.4583798
0.1457386
auto
ma1
0.5648437
0.1095075
5.1580342
0.0000004
auto
constant
0.0000450
0.0001977
0.2275258
0.8201635
The following plots give the acf and pacf of the residuals from this model.
Small-Group Activity: Fitting an ARIMA Process - Microsoft Stock Prices (20 min)
A time series given the daily closing price for Microsoft (MSFT) stock is given below. To handle the gaps in the data, we define a new variable, t, which gives the observation number.
Show the code
# Set symbol and date rangesymbol <-"MSFT"# Abercrombie & Fitch stock trading symboldate_start <-"2020-01-01"date_end <-"2024-03-28"# Fetch stock pricesdf_stock <-tq_get(symbol, from = date_start, to = date_end, get ="stock.prices")# Transform data into tsibblestock_ts <- df_stock |>mutate(dates = date, value = close ) |> dplyr::select(dates, value) |>as_tibble() |>arrange(dates) |>mutate(t =1:n()) |>as_tsibble(index = t, key =NULL)
Check Your Understanding
Using the daily closing prices of Microsoft stock, do the following.
Make a time plot of the data.
Create a correlogram and partial correlogram of the stock prices.
Fit candidate \(ARIMA(p,d,q)\) models to the data.
Choose the “best” model, and justify your selection.
Generate a correlagram and partial correlogram of the residuals from your chosen model.
Make a histogram of the residuals from your model.
Did your your model account for the the time series?
Predict the value 60 trading days in the future.
Note: The “time” index is just an integer sequence in the stock_ts tsibble. So, apply the forecast() function as forecast(h = 60), rather than forecast(h = "60 days").