Moving averages can smooth time series data, reveal underlying trends, and identify components for use in statistical modeling. Smoothing is the process of removing random variations that appear as coarseness in a plot of raw time series data. It reduces the noise to emphasize the signal that can contain trends and cycles. Analysts also refer to the smoothing process as filtering the data.

Developed in the 1920s, the moving average is the oldest process for smoothing data and continues to be a useful tool today. This method relies on the notion that observations close in time are likely to have similar values. Consequently, the averaging removes random variation, or noise, from the data.

In this post, I look at using moving averages to smooth time series data. This method is the simplest form of smoothing. In future posts, I’ll explore more complex ways of smoothing.

## What are Moving Averages?

Moving averages are a series of averages calculated using sequential segments of data points over a series of values. They have a length, which defines the number of data points to include in each average.

### One-sided moving averages

One-sided moving averages include the current and previous observations for each average. For example, the formula for a moving average (MA) of X at time *t* with a length of 7 is the following:

In the graph, the circled one-sided moving average uses the seven observations that fall within the red interval. The subsequent moving average shifts the interval to the right by one observation. And, so on.

### Centered moving averages

Centered moving averages include both previous and future observations to calculate the average at a given point in time. In other words, centered moving averages use observations that surround it in both directions and, consequently, are also known as two-sided moving averages. The formula for a centered moving average of X at time *t* with a length of 7 is the following:

In the plot below, the circled centered moving average uses the seven observations in the red interval. The next moving average shifts the interval to the right by one.

Centered intervals work out evenly for an odd number of observations because they allow for an equal amount of observations before and after the moving average. However, when you have an even length, the calculations must adjust for that by using a weighted moving average. For example, the formula for a centered moving average with a length of 8 is as follows:

For a length of 8, the calculations incorporate the formula for a length of 7 (*t*-3 through *t*+3). Then, it extends the segment by one observation in both directions (*t*-4 and *t*+4). However, those two observations each have half the weight, which yields the equivalent of 7 + 2*0.5 = 8 data points.

## Using Moving Averages to Reveal Trends

Moving averages can remove seasonal patterns to reveal underlying trends. In future posts, I’ll write more about time series components and incorporating them into models for accurate forecasting. For now, we’ll work through an example to visually assess a trend.

When there is a seasonal pattern in your data and you want to remove it, set the length of your moving average to equal the pattern’s length. If there is no seasonal pattern in your data, choose a length that makes sense. Longer lengths will produce smoother lines.

Note that the term “seasonal” pattern doesn’t necessarily indicate a meteorological season. Instead, it refers to a repeating pattern that has a fixed length in your data.

## Time Series Example: Daily COVID-19 Deaths in Florida

For our example, I’ll use daily COVID-19 deaths in the State of Florida. The time series plot below displays a recurring pattern in the number of daily deaths.

This pattern likely reflects a data artifact. We know the coronavirus does not operate on a seven-day weekly schedule! Instead, it must reflect some human-based scheduling factor that influences when causes of death are determined and recorded. Some of these activities must be less likely to occur on weekends because the lowest day of the week is almost always Sunday, and weekends, in general, tend to be low. Tuesdays are often the highest day of the week. Perhaps that is when the weekend backlog shows up in the data?

Because of this seasonal pattern, the number of recorded deaths for a particular day depends on the day of the week you’re evaluating. Let’s remove this season pattern to reveal the underlying trend component. The original data are from Johns Hopkins University. Download my Excel spreadsheet: Florida Deaths Time Series.

The graph displays one-sided moving averages with a length of 7 days for these data. Notice how the seasonal pattern is gone and the underlying trend is visible. Each moving average point is the daily average of the past seven days. We can look at any date, and the day of the week no longer plays a role. We can see that the trend increases up to April 17, 2020. It plateaus, with a slight decline, until around June 22^{nd}. Since then, there is an upward trend that appears to steepen at the end.

Smoothing time series data helps reveal the underlying trends in your data. That process can aid in the simple visual assessment of the data, as seen in this article. However, it can also help you fit the best time series model to your data. The moving average is a simple but very effective calculation!

Nick says

Really interesting post Jim. However maths is definitely not my forte. My question is: If I wanted to smooth out a graph of a fast moving interval of say between ten and twenty seconds, with a refresh rate of five per second; which moving average would smooth out the graph best?

Jim Frost says

Hi Nick,

Ideally, you want to look for some natural period to use. In my example, a week is a natural period for the data in question. You need to consider if there is such a natural moving average length for your data. Unfortunately, there’s no generally correct answer other than to understand your data and use a length that makes sense.

If there is no such natural length, then you’ll need to experiment and strike a balance. The tradeoff in question is between a smooth curve that doesn’t have the noise versus a curve that reacts to changing conditions. A longer moving average length will produce a smoother curve but it reacts more slowly to recent changes. A shorter length will allow the curve to react more quickly to changing conditions, but the curve becomes noisy.

So, if there’s a natural length try that. If not, experiment with your data and find a nice balance between a smooth curve and ability to react to changing conditions.

I hope that helps!

George Hayes says

I can’t offer any formal statistical knowledge-based insights (ie. I don’t know what I am talking about!) but I use MAs to depict trends and for basic planning in my work. Intuitively, I find CMAs more appealing but to maybe answer a bit your question Jake – the problem is they start/stop short of the end of your data. So if you are using a 7 day CMA you can’t produce a number until you are 7 days in and that number is ‘4 days old’. You can’t calculate ‘today’s number’ until 3 more days have passed. That doesn’t make them wrong or not useful – but sometimes folk seem to want to see numbers for today actually on today. I believe this is why trailing MAs are more popular – the instant today’s data point is available, you can calculate your MA for today. Given the use of trend spotting tools gets used a lot in equities and markets, there is a drive to see something that gives a result ‘now’. You can say it’s better to wait all you like for the CMA to ‘catch up’ and give you a ‘better’ number, but by then the world of markets trading has moved on.

Jim Frost says

Hi George,

Thanks for providing the excellent and very practical reasoning about it! Very helpful from the real-world perspective!

Jake says

Hi Jim, so what is the reason for using centered moving average instead of simple moving average? What’s the difference?

Wellington Movers says

Thanks a lot to you for sharing moving averages smoothing here, these kind of ideas are were much needed. I really appreciate that you have provided the data too, really appreciative and useful blog for us. Looking for more!!

Noah May says

To deal with the outlier, it would be best in most cases to clearly describe it and then remove it so it does not affect your main results. That is, unless the outlier directly affects the results you are researching. It depends on what you are trying to do.

Personally, I think it would be a little weird/confusing to have just a short MA just to ignore the effects of an outlier.

Noah

Tommer Rissin says

Hey Jim,

Great post, short and straight forward.

I have a small question which is not clear to me:

Is it acceptable to use this technique to smooth out only a part of the data?

For example if the was one strong outlier which i wanted to not effect the rest of the data so i would just smooth out the point around it…

Also this technique reminds me of image compression algorithms which “merge” nearby pixels with similar colors.

Thank you for your post,

Tommer

ray says

Thank you Jim,

for another clear and short post.

Nice.

small suggestion:

– in the plots,

add some tick marks (and some date grid lines)

above each date in the X-axis.

Makes it easier to follow your narrative,,,

Again Jim, thanks!

ray