Autoregressive Integrated Moving Average Arima

What is ARIMA (Autoregressive Integrated Moving Average)?
– ARIMA is a statistical model used to describe and forecast a time series — a sequence of observations indexed by time (for example, daily prices, monthly sales, or quarterly GDP).
– It combines three ideas:
– Autoregressive (AR): the model uses past values of the series as predictors.
– Integrated (I): the series may be differenced d times to remove trends and achieve stationarity (constant mean and variance over time).
– Moving Average (MA): the model uses past forecast errors (residuals) as predictors.

Key terms (first use definitions)
– Time series: data points collected or recorded at successive points in time.
– Stationarity: a property of a time series whose statistical characteristics (mean, variance, autocorrelation) do not change over time.
– Lag operator L: an operator taking a series value back one period (L y_t = y_{t-1}).
– Autocorrelation: correlation of a series with lagged versions of itself.
– Partial autocorrelation: correlation between y_t and y_{t-k} after removing effects of intermediate lags.

ARIMA model notation and meaning
– ARIMA(p, d, q)
– p = order of autoregression (number of AR lags).
– d = degree of differencing needed to make the series stationary.
– q = order of moving average (number of MA terms).
– Compact form (conceptual): (AR polynomial) × (difference operator)^d × y_t = constant + (MA polynomial) × error_t

Why differencing (the “I” part)?
– Many economic and financial series have trends or changing variance. Differencing (computing y_t − y_{t−1}, or higher orders) reduces trends and often yields a stationary series that is suitable for AR/MA modeling.
– Choosing the smallest d that produces stationarity is standard practice.

How AR and MA pieces differ
– AR(p): y_t depends on p past values: y_t = φ1 y_{t−1} + … + φp y_{t−p} + error.
– MA(q): y_t depends on q past errors: y_t = μ + error_t + θ1 error_{t−1} + … + θq error_{t−q}.
– ARMA(p,q): when the series is already stationary (d = 0), combine AR and MA terms.
– ARIMA allows nonstationary series by differencing first (d > 0).

Step-by-step checklist to build an ARIMA model
1. Collect clean, sufficiently long time-series data for the variable of interest.
2. Visualize the series (plot) and look for trend, seasonality, or structural breaks.
3. Test for stationarity (e.g., visual inspection, unit-root tests such as the Augmented Dickey–Fuller). If nonstationary, difference the series.
4. Choose d as the minimum number of differences needed to reach stationarity.
5. Examine the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the (differ

5. Examine the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the (differenced) series.
– ACF: correlation between observations separated by k lags.
– PACF: correlation at lag k after removing effects of intermediate lags.
– Rule-of-thumb for model orders: a sharp cutoff in the PACF after lag p suggests an AR(p) component; a sharp cutoff in the ACF after lag q suggests an MA(q) component. If neither cuts off cleanly, a mixed ARMA(p,q) is likely. These are heuristics — follow them with formal fitting and diagnostics.

6. Fit candidate ARIMA(p,d,q) models.
– Estimate parameters by maximum likelihood (or conditional least squares).
– Fit several (p,q) combinations suggested by the ACF/PACF heuristics and by information criteria (see step 7).
– For seasonal data, include seasonal orders (P,D,Q)s and try SARIMA models.

7. Compare and validate models.
– Use information criteria: Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC). Lower values are preferred, but watch for overfitting.
– Check residual diagnostics: residuals should resemble white noise (uncorrelated, near-zero mean). Plot residual ACF and perform a Ljung–Box Q test for remaining autocorrelation.
– Examine residual variance and any time-varying volatility (if present, consider ARCH/GARCH).
– Check forecast performance on holdout data (out-of-sample) using metrics such as mean absolute error (MAE) or root mean squared error (RMSE). Use rolling-origin cross-validation for robust assessment.

8. Produce forecasts and forecast intervals.
– For h-step-ahead forecasts, iterate the model forward; when differencing was used, invert differences to return