What is ARIMA?
An ARIMA model, short for AutoRegressive Integrated Moving Average, is a statistical method for modeling and forecasting time series data. It combines three components to model patterns in a dataset over time: autoregression (AR), differencing for stationarity (I), and moving averages (MA).
- Autoregressive (AR): Models the current value as a function of its past values.
- Integrated (I): Refers to differencing the data one or more times to remove trends and make the series stationary—meaning its properties stay consistent over time.
- Moving average (MA): Models the current value as a function of past forecast errors.
ARIMA models are specifically designed to handle time series data, which often violate the assumptions of ordinary linear regression. Unlike regression, ARIMA accounts for autocorrelation, where past values influence future ones, and it can model non-stationary data by applying differencing to remove trends. It also incorporates moving average components to account for patterns in the residuals—something standard regression can’t handle. These capabilities make ARIMA well-suited for forecasting problems where a series of observations has an internal structure shaped by time order effects.
ARIMA models are useful for analyzing and forecasting data where values depend on previous values, such as monthly sales figures, stock prices, or temperature records. They are especially powerful when the data does not follow a clear seasonal pattern (non-seasonal ARIMA). If seasonality is present, analysts often use an extension called SARIMA (Seasonal ARIMA).
ARIMA Model
An ARIMA model is typically written as ARIMA(p, d, q), where:
- p is the number of autoregressive terms.
- d is the number of times the data must be differenced to become stationary.
- q is the number of lagged forecast errors in the model.
An ARIMA(p, d, q) model is like a custom recipe for forecasting a time series.
- Differencing (d) first removes trend-driven non-stationarity, giving you stable “ingredients.”
- The autoregressive part (p) folds in lagged values of the series (how yesterday influences today).
- The moving average part (q) stirs in lagged forecast errors to mop up leftover serial correlation.
The model estimates constant coefficients (ϕ’s and θ’s) that best fit the data. Ideally, the finished model captures the series’ dependence structure and can project it forward to generate forecasts.
Checking Assumptions
After fitting an ARIMA model, checking the residual plots is crucial. The model must capture all the structure in the data, leaving behind only white noise (e.g., random, no patterns) in the residuals. Ideally, the residuals should have no autocorrelation, a constant variance, a mean near zero, and a reasonably normal shape, especially if you plan to use confidence intervals or prediction intervals.
To assess these assumptions, analysts typically inspect residual plots, examine ACF and PACF plots, and run a Ljung–Box test to check for independence. A Q-Q plot can help evaluate the normality of the residuals. If any of these diagnostics show a problem, you may need to adjust the model—such as changing the number of AR or MA terms, applying additional differencing, or using a variance-stabilizing transformation. After modifying the model, you refit it and recheck the diagnostics until the residuals behave appropriately.
ARIMA Example
For example, a company might use an ARIMA model to forecast next quarter’s revenue based on past trends and fluctuations in previous quarters. By capturing both trend and autocorrelation, ARIMA models help generate more accurate, data-driven forecasts.
« Back to Glossary Index