Excel can perform various statistical analyses, including regression analysis. It is a great option because nearly everyone can access Excel. This post is an excellent introduction to performing and interpreting regression analysis, even if Excel isn’t your primary statistical software package.
Moving averages can smooth time series data, reveal underlying trends, and identify components for use in statistical modeling. Smoothing is the process of removing random variations that appear as coarseness in a plot of raw time series data. It reduces the noise to emphasize the signal that can contain trends and cycles. Analysts also refer to the smoothing process as filtering the data.
Developed in the 1920s, the moving average is the oldest process for smoothing data and continues to be a useful tool today. This method relies on the notion that observations close in time are likely to have similar values. Consequently, the averaging removes random variation, or noise, from the data.
In this post, I look at using moving averages to smooth time series data. This method is the simplest form of smoothing. In future posts, I’ll explore more complex ways of smoothing.
What are Moving Averages?
Moving averages are a series of averages calculated using sequential segments of data points over a series of values. They have a length, which defines the number of data points to include in each average.
One-sided moving averages
One-sided moving averages include the current and previous observations for each average. For example, the formula for a moving average (MA) of X at time t with a length of 7 is the following:
In the graph, the circled one-sided moving average uses the seven observations that fall within the red interval. The subsequent moving average shifts the interval to the right by one observation. And, so on.
Centered moving averages
Centered moving averages include both previous and future observations to calculate the average at a given point in time. In other words, centered moving averages use observations that surround it in both directions and, consequently, are also known as two-sided moving averages. The formula for a centered moving average of X at time t with a length of 7 is the following:
In the plot below, the circled centered moving average uses the seven observations in the red interval. The next moving average shifts the interval to the right by one.
Centered intervals work out evenly for an odd number of observations because they allow for an equal amount of observations before and after the moving average. However, when you have an even length, the calculations must adjust for that by using a weighted moving average. For example, the formula for a centered moving average with a length of 8 is as follows:
For a length of 8, the calculations incorporate the formula for a length of 7 (t-3 through t+3). Then, it extends the segment by one observation in both directions (t-4 and t+4). However, those two observations each have half the weight, which yields the equivalent of 7 + 2*0.5 = 8 data points.
Using Moving Averages to Reveal Trends
Moving averages can remove seasonal patterns to reveal underlying trends. In future posts, I’ll write more about time series components and incorporating them into models for accurate forecasting. For now, we’ll work through an example to visually assess a trend.
When there is a seasonal pattern in your data and you want to remove it, set the length of your moving average to equal the pattern’s length. If there is no seasonal pattern in your data, choose a length that makes sense. Longer lengths will produce smoother lines.
Note that the term “seasonal” pattern doesn’t necessarily indicate a meteorological season. Instead, it refers to a repeating pattern that has a fixed length in your data.
Time Series Example: Daily COVID-19 Deaths in Florida
For our example, I’ll use daily COVID-19 deaths in the State of Florida. The time series plot below displays a recurring pattern in the number of daily deaths.
This pattern likely reflects a data artifact. We know the coronavirus does not operate on a seven-day weekly schedule! Instead, it must reflect some human-based scheduling factor that influences when causes of death are determined and recorded. Some of these activities must be less likely to occur on weekends because the lowest day of the week is almost always Sunday, and weekends, in general, tend to be low. Tuesdays are often the highest day of the week. Perhaps that is when the weekend backlog shows up in the data?
Because of this seasonal pattern, the number of recorded deaths for a particular day depends on the day of the week you’re evaluating. Let’s remove this season pattern to reveal the underlying trend component. The original data are from Johns Hopkins University. Download my Excel spreadsheet: Florida Deaths Time Series.
The graph displays one-sided moving averages with a length of 7 days for these data. Notice how the seasonal pattern is gone and the underlying trend is visible. Each moving average point is the daily average of the past seven days. We can look at any date, and the day of the week no longer plays a role. We can see that the trend increases up to April 17, 2020. It plateaus, with a slight decline, until around June 22nd. Since then, there is an upward trend that appears to steepen at the end.
Smoothing time series data helps reveal the underlying trends in your data. That process can aid in the simple visual assessment of the data, as seen in this article. However, it can also help you fit the best time series model to your data. The moving average is a simple but very effective calculation!
The Birthday Problem in statistics asks, how many people do you need in a group to have a 50% chance that at least two people will share a birthday? Go ahead and think about that for a moment. The answer surprises many people. We’ll get to that shortly.
In this post, I’ll not only answer the birthday paradox, but I’ll also show you how to calculate the probabilities for any size group, run a computer simulation of it, and explain why the answer to the Birthday Problem is so surprising. [Read more…] about Answering the Birthday Problem in Statistics
Use a variances test to determine whether the variability of two groups differs. In this post, we’ll work through a two-sample variances test that Excel provides. Even if Excel isn’t your primary statistical software, this post provides an excellent introduction to variance tests. Excel refers to this analysis as F-Test Two-Sample for Variances. [Read more…] about How to Test Variances in Excel
Use two-way ANOVA to assess differences between the group means that are defined by two categorical factors. In this post, we’ll work through two-way ANOVA using Excel. Even if Excel isn’t your main statistical package, this post is an excellent introduction to two-way ANOVA. Excel refers to this analysis as two factor ANOVA. [Read more…] about How to do Two-Way ANOVA in Excel
Use one-way ANOVA to determine whether the means of at least three groups are different. Excel refers to this test as Single Factor ANOVA. This post is an excellent introduction to performing and interpreting one-way ANOVA even if Excel isn’t your primary statistical software package. [Read more…] about How to do One-Way ANOVA in Excel
Excel can perform various statistical analyses, including t-tests. It is an excellent option because nearly everyone can access Excel. This post is a great introduction to performing and interpreting t-tests even if Excel isn’t your primary statistical software package.
In this post, I provide step-by-step instructions for using Excel to perform t-tests. Importantly, I also show you how to select the correct form of t-test, choose the right options, and interpret the results. I also include links to additional resources I’ve written, which present clear explanations of relevant t-test concepts that you won’t find in Excel’s documentation. And, I use an example dataset for us to work through and interpret together! [Read more…] about How to do t-Tests in Excel