Time Series Analysis

Rolling Average

What is a Rolling Average?

A rolling average, also called a moving average, is a technique that smooths out short-term fluctuations in data and highlights longer-term trends. It works by taking the average of a fixed number of consecutive observations (called the window) and then shifting that window forward one data point at a time.

For example, a 7-day rolling average of daily temperatures calculates the average temperature for Days 1 through 7, then Days 2 through 8, then Days 3 through 9, and so on. Each average becomes one point in the smoothed series.

Analysts commonly use rolling averages in time series data, such as stock prices, weather trends, or COVID-19 case counts. They reduce noise from day-to-day variation and help reveal the underlying pattern.

A good rule of thumb is to select a window size that reflects a natural cycle in the data. For instance, a 7-day window often works well for daily data with weekly patterns, such as retail sales or hospital admissions. Matching the window to the cycle helps the average filter out repeating ups and downs while preserving meaningful trends.

How to Calculate a Rolling Average

The following steps show you how to calculate a rolling average:

Choose a window size such as 3, 7, or 30.
For each position in the data series, take the average of the current value and the values that come before it (for a trailing average), using the total number of values in the window.
Slide the window forward by one data point and repeat.

For example, suppose we have daily sales data for seven days:
5, 8, 6, 7, 10, 12, 9

The 3-day rolling average values are:

Day 3: (5 + 8 + 6) ÷ 3 = 6.33
Day 4: (8 + 6 + 7) ÷ 3 = 7.00
Day 5: (6 + 7 + 10) ÷ 3 = 7.67
Day 6: (7 + 10 + 12) ÷ 3 = 9.67
Day 7: (10 + 12 + 9) ÷ 3 = 10.33

Each value reflects the local trend over a short window of time.

For example, a 7-day rolling average of daily new COVID-19 deaths helped public health officials understand whether deaths are rising or falling without being misled by single-day spikes or dips due to reporting delays.

Stationarity

By Jim Frost

In time series analysis, stationarity refers to a condition where the statistical properties of a time series—such as its mean, variance, and autocorrelation—remain constant over time. A stationary time series does not have trends, changing variability, or evolving seasonal patterns. This stability makes it easier to model and forecast, which is why many time series methods, including ARIMA, require the data to be stationary.

There are two main types:

Strict stationarity: The entire distribution of the process remains unchanged over time.
Weak (or second-order) stationarity: Only the mean, variance, and autocorrelation structure are constant. This form is more commonly used in practice.

The graphs below show two time series:

The left panel displays a stationary time series (constant mean and variance).
The right panel shows a non-stationary time series (a drifting mean).

Most real-world time series data are not stationary. Trends, seasonal cycles, or changing volatility can make a series non-stationary. To address this, analysts often apply a process called differencing—subtracting each value from the previous one—to remove trends or seasonality and transform the data into a stationary form. In some cases, multiple rounds of differencing may be needed.

For example, monthly sales data that steadily increase over time are not stationary due to the upward trend. But by applying differencing, the resulting series may fluctuate around a constant mean, making it suitable for models like ARIMA that assume stationarity. Identifying and correcting non-stationarity is a critical first step in reliable time series analysis.

Autoregressive Model [AR Model]

By Jim Frost

An autoregressive model (AR model) is a statistical model that analyzes and forecasts time series data. These models express the current value of a time series as a weighted sum of its previous values. The term “autoregressive” reflects this idea of self-reference—each value is regressed on past values of the same variable. After fitting the model, analysts can use it to forecast future values by applying the same structure to the most recent data.

Analysts commonly use AR models in economics, environmental science, and engineering for forecasting trends in time series where past behavior carries forward. For example, an economist might use an AR model to forecast next month’s unemployment rate based on previous months’ rates, assuming the pattern persists over time.

How an AR Model Works

An AR model is written as AR(p), where p is the number of lagged observations included in the model. For example, an AR(2) model uses the two previous time points to predict the current one. The general form looks like this:

Yₜ = ϕ₁Yₜ₋₁ + ϕ₂Yₜ₋₂ + … + ϕₚYₜ₋ₚ + εₜ

Where:

Yₜ is the value at time t,
ϕ₁ through ϕₚ are the model coefficients (weights),
εₜ is the random error at time t.

Autoregressive models assume the data are stationary, meaning its statistical properties like mean and variance don’t change over time. If the data aren’t stationary, it often needs to be transformed (e.g., by differencing) before fitting an AR model.

When fitting an autoregressive model, the goal is to estimate the coefficients (ϕ-values) that determine how much weight to assign to each previous value in the series. The model chooses these coefficients by minimizing the prediction errors—the differences between the actual values and the values predicted by the model. This process ensures that the model closely reflects the time-dependent structure in the data.

Example

For example, suppose an economist fits an AR(2) model to forecast monthly unemployment rates and the estimated model is:

If last month’s unemployment rate was 7.2% and the month before was 7.5%, the model predicts the current month’s rate as:

This predicted value reflects how much influence recent months have on the current estimate, based on the learned weights from past data.

ARIMA

By Jim Frost

What is ARIMA?

An ARIMA model, short for AutoRegressive Integrated Moving Average, is a statistical method for modeling and forecasting time series data. It combines three components to model patterns in a dataset over time: autoregression (AR), differencing for stationarity (I), and moving averages (MA).

Autoregressive (AR): Models the current value as a function of its past values.
Integrated (I): Refers to differencing the data one or more times to remove trends and make the series stationary—meaning its properties stay consistent over time.
Moving average (MA): Models the current value as a function of past forecast errors.

ARIMA models are specifically designed to handle time series data, which often violate the assumptions of ordinary linear regression. Unlike regression, ARIMA accounts for autocorrelation, where past values influence future ones, and it can model non-stationary data by applying differencing to remove trends. It also incorporates moving average components to account for patterns in the residuals—something standard regression can’t handle. These capabilities make ARIMA well-suited for forecasting problems where a series of observations has an internal structure shaped by time order effects.

ARIMA models are useful for analyzing and forecasting data where values depend on previous values, such as monthly sales figures, stock prices, or temperature records. They are especially powerful when the data does not follow a clear seasonal pattern (non-seasonal ARIMA). If seasonality is present, analysts often use an extension called SARIMA (Seasonal ARIMA).

ARIMA Model

An ARIMA model is typically written as ARIMA(p, d, q), where:

p is the number of autoregressive terms.
d is the number of times the data must be differenced to become stationary.
q is the number of lagged forecast errors in the model.

An ARIMA(p, d, q) model is like a custom recipe for forecasting a time series.

Differencing (d) first removes trend-driven non-stationarity, giving you stable “ingredients.”
The autoregressive part (p) folds in lagged values of the series (how yesterday influences today).
The moving average part (q) stirs in lagged forecast errors to mop up leftover serial correlation.

The model estimates constant coefficients (ϕ’s and θ’s) that best fit the data. Ideally, the finished model captures the series’ dependence structure and can project it forward to generate forecasts.

Checking Assumptions

After fitting an ARIMA model, checking the residual plots is crucial. The model must capture all the structure in the data, leaving behind only white noise (e.g., random, no patterns) in the residuals. Ideally, the residuals should have no autocorrelation, a constant variance, a mean near zero, and a reasonably normal shape, especially if you plan to use confidence intervals or prediction intervals.

To assess these assumptions, analysts typically inspect residual plots, examine ACF and PACF plots, and run a Ljung–Box test to check for independence. A Q-Q plot can help evaluate the normality of the residuals. If any of these diagnostics show a problem, you may need to adjust the model—such as changing the number of AR or MA terms, applying additional differencing, or using a variance-stabilizing transformation. After modifying the model, you refit it and recheck the diagnostics until the residuals behave appropriately.

ARIMA Example

For example, a company might use an ARIMA model to forecast next quarter’s revenue based on past trends and fluctuations in previous quarters. By capturing both trend and autocorrelation, ARIMA models help generate more accurate, data-driven forecasts.