Time Series Analysis Introduction

By Jim Frost 28 Comments

Time series analysis tracks characteristics of a process at regular time intervals. It’s a fundamental method for understanding how a metric changes over time and forecasting future values. Analysts use time series methods in a wide variety of contexts.

Area	Examples
Business	Manufacturing output, sales, prices
Economics	GDP, stock market, and other indicators
Social Sciences	Population trends, political changes
Medicine	Disease, birth, and death rates
Physical Sciences	Climate and environmental changes

In this post, I cover the basics of time series analysis. I’ll define this type of data, explain what we can learn from it, and touch on more advanced methods that I will explore in future posts. For this introduction, I focus on using time series plots to highlight what you can learn from these data.

Time Series Data

A time series is a set of measurements that occur at regular time intervals. For this type of analysis, you can think of time as the independent variable, and the goal is to model changes in a characteristic (the dependent variable).

For example, you might measure the following:

Hourly consumption of energy
Daily sales
Quarterly profits
Annual changes in a country’s population

Each of these examples tracks a single metric at regular time points. Use subject-area knowledge to choose the appropriate time interval that allows you to answer your research questions.

Compared to Panel Data

Panel data contain observations of multiple characteristics measured over time for the same set of subjects, such as people, businesses, or countries.

For panel data, you need a time value and an additional characteristic to identify a particular observation. For example, if you have panel data that tracks sales for a group of companies over time, you’ll need a time value and a company identifier to find an individual observation.

Time series data are a sub-type of the broader class of panel data. These series only track a single characteristic. Consequently, you only need the time value. For example, for the annual number of breast cancer cases time series, you just need to know the year to identify the observation.

Compared to Cross-Sectional Data

Cross-sectional data describes a set of people, items, companies, etc. at a single point in time. The goal is to determine the differences between the subjects at one time. For example, a cross-sectional study might assess wages by education level to understand the impact of education. Time does not play a role in this type of analysis.

Related post: Guide to Data Types and How to Graph Them

Goals of Time Series Analysis

Time series analysis seeks to understand patterns in changes over time. Statisticians refer to these patterns as the components of a time series and they include trends, cycles, and irregular movements. When these components exist in a time series, the model must account for these patterns to generate accurate forecasts, such as future sales, GDP, and global temperatures.

In addition to these patterns, time series models typically incorporate the fact that time flows in one direction. Past events can influence future observations but not the other way around. Additionally, events close together in time often have a stronger association than more distant observations. While these ideas are obvious to us, statisticians had to build them into how these models work.

Like all data, time series data contain random fluctuations. This randomness can obscure the underlying patterns. Smoothing techniques cancel out these fluctuations to more clearly unveil the trends and cycles.

In other posts, I cover the modeling and smoothing techniques.

Using Graphs to Understand the Components of a Time Series

For this introductory post, we’ll stick with the simple time series plot, and save the smoothing and modeling for later posts. Despite its simplicity, these graphs effectively illustrate how metrics change over time.

You can improve your understanding of the following components by assessing additional graphs, such as the autocorrelation and partial autocorrelation functions.

Time series plots are a specialized type of line chart.

Trends

Trends are a long-term tendency of a time series to either increase or decrease. For example, the two plots below show trends over time.

At a glance, we can determine that air pollution deaths are decreasing over the years. On the other hand, cases of breast cancer in men and women are increasing over the years. If we need to generate forecasts for future years, our model would include these trends.

Seasonal Cycles

Seasonal cycles are patterns that repeat within a time series. Typically, they have a fixed length. Understanding seasonal cycles requires using your subject-area knowledge. For example, an analyst might note a weekly cycle where weekends tend to have higher retail sales than weekdays due to a higher number of store patrons. Alternatively, sales of summer products will tend to peak in the summer months and decline in the other months. To model these data accurately, analysts need to understand these cycles.

The examples below illustrate two datasets with seasonal cycles.

In the plot above, food production by month has a repeating cycle. However, in these data, there is no overall trend. The cycle repeats but there is no long term tendency. To generate forecasts for these data, our model would need to account for the cycle but not worry about a longer-term increase or decrease.

The graph for trade activity by month displays both a seasonal cycle and a long-term trend. There’s a repeating pattern and a tendency to increase over time. For these data, our model would need to account for both components.

While time series plots are straightforward, they can yield a great deal of information about how a metric changes over time.

Comments

Saumya says

October 24, 2021 at 8:14 am

Hi Jim, I have learnt a lot from these posts of yours. I have a confusion, say you have to choose between the frequency of time-series to use for your linear regression model – daily, weekly or monthly. Would you say your choice can affect the perfromace of your model. What is the relevance of time-dimension granularitty of time-series data on the perfromace of say a multiple linear regression model.

Loading...

Reply
Berns Buenaobra says

May 17, 2021 at 3:27 am

What would also be interesting is if there is some way to extract an not so obvious seasonality when data is obscured with random noise?

Loading...

Reply
Petra Valickova says

April 19, 2021 at 7:14 pm

Hi Jim, I wish I had known you before!

One question, thinking about a classical growth regression… In this case a classical growth regression augmented for a measure of financial development (G = α+ βF+Xγ+ δ+η+ϵ), where G stands for GDP growth rate or per capita GDP growth rate, F represents a measure of financial development (some measure of the size or activity of financial intermediaries or some measure of stock market development) and X is a vector of control variables to account for other factors considered important in the growth process. Regarding model specification, I’ve seen some researchers using nominal GDP growth rate – i.e. these researchers did not adjust for inflation to derive real GDP growth rate. Can you think about a reason why this is the case? Could it be for example if you have the measure of financial development also in nominal terms, then your dependent variable should also be in nominal terms (i.e. growth rate of nominal GDP)? Or maybe for some econometric techniques nominal terms are needed?

Thank you!!
Petra

Loading...

Reply
- Jim Frost says
  
  April 20, 2021 at 2:34 pm
  
  Hi Petra,
  
  That is a bit puzzling. Perhaps they incorporate inflation into the model using some other variable? Economic analysis is my area of expertise so I don’t have much insight.
  
  Loading...
  
  Reply
Khyati says

March 3, 2021 at 4:17 am

Hey Jim, great article. I love how you explain difficult concepts in a lucid manner.
Could you please explain Time series models such as AR, MA, ARMA, ARCH, GARCH and ARIMA?
Would be really helpful, thanks.

Loading...

Reply
- Jim Frost says
  
  March 3, 2021 at 5:27 pm
  
  Hi Khyati,
  
  Thanks so much! I will explain the other time series models. I’m building up the pieces as I go forward. Note that I do explain MA (moving average). Stayed tuned for the other types of models!
  
  Loading...
  
  Reply
Tony says

March 2, 2021 at 5:18 pm

Hi Jim,

Yes, it helps. I appreciate it.

Thanks,

Tony

Loading...

Reply
jeremy says

March 1, 2021 at 12:51 pm

Hi Jim, what if you’re measuring, say, cholesterol levels in the same group of people over time, but some of them have missing data at some observation points because they didn’t attend their appointments. With some methods (like ANOVA I believe) you’d have to throw out all the observations for persons with any missing data. Would missing data be a problem with the methods you describe here?

Loading...

Reply
- Jim Frost says
  
  March 2, 2021 at 2:38 am
  
  Hi Jeremy,
  
  Yes, you can’t have any missing values for these methods. However, if you don’t have too many missing, you can use moving averages or exponential smoothing to estimate the missing values. It’s not ideal but it can help! There are even some specialized missing value imputation methods that you can use. I haven’t used those specifically but I have used averages and even regression analysis to estimate missing values.
  
  Loading...
  
  Reply
Tony says

March 1, 2021 at 8:18 am

Hi Jim,

This was a great read. Could exponential smoothing or a regression analysis be used to forecast if a client is going to default on invoice payment terms? If so, how would the data be set up for both exponential smoothing and a regression analysis?

Loading...

Reply
- Jim Frost says
  
  March 2, 2021 at 2:19 am
  
  Hi Tony,
  
  That sounds like a binary logistic regression model to me. You’d use default/non-default as the binary dependent variable. Then include whatever independent variables you have, allowing you to predict the probability that a client will default. For exponential smoothing, you could track defaults overtime and predict the number of defaults overall, but it wouldn’t be client specific. The type of exponential smoothing depends on the characteristics of the data, as I show in this post.
  
  I hope that helps!
  
  Loading...
  
  Reply
Priyalal says

December 3, 2020 at 12:48 am

Pls include Eviews software analysis practical example too time series models (ARIMA, GARCH ,VAR and VEC )

Loading...

Reply
Perry Gonen says

August 18, 2020 at 10:41 am

I second everyone who requested a book on time series. ARIMA, GARCH, EGARCH. Dynamic models and Stationary, AR, ARDL model, VAR and VEC models

Loading...

Reply
- Jim Frost says
  
  August 19, 2020 at 11:06 pm
  
  Hi Perry, thanks for your enthusiasm! I think it’s safe to say that I’ll be writing a time series book covering a number of those models.
  
  Loading...
  
  Reply
Jerry says

August 3, 2020 at 1:44 pm

I second that emotion !

Loading...

Reply
Jacob Willie says

July 9, 2020 at 3:17 am

Great post, thanks!

Loading...

Reply
Oladayo says

July 8, 2020 at 4:19 pm

Thanks Jim. Very helpful, I got wide knowledge now.

Loading...

Reply
Thin Thin Swe says

July 8, 2020 at 11:04 am

Thank you. Nice post!!

Loading...

Reply
hayet says

July 6, 2020 at 5:52 pm

bonjour . pour les série journalier et les model arch et garch et ces application dans logiciel eviews

Loading...

Reply
Ramesh Chandra Das says

July 6, 2020 at 7:16 am

Nice post!! Kindly write a time series book using ARIMA, GARCH, EGARCH. It would be more beneficial for us.

Loading...

Reply
Janine Zitianellis says

July 6, 2020 at 3:28 am

I second that!

Loading...

Reply
Fedor Tarasov says

July 6, 2020 at 2:45 am

Hi, can you please publish book on timseries analysis? ARIMA GARCH etc.

Loading...

Reply
Otieno Joyce Akinyi says

July 6, 2020 at 2:35 am

Informative post. Looking forward to more

Loading...

Reply
Otieno Joyce Akinyi says

July 6, 2020 at 2:31 am

I think in times series data, the characteristic is measured in equal time gaps; while in longitudinal analysis you have repeated measures of the characteristic.

Loading...

Reply
Carlos Mora says

July 6, 2020 at 2:03 am

What software do you recommend for time series analysis? I use Statgraphics, Centurion, because it has an interactive interface.

Loading...

Reply
Jane says

July 6, 2020 at 12:56 am

Hi, Jim,
I am often asked about using time series vs. longitudinal analysis when there are measurements over time (maybe 3-6 measurements). I was taught that time series data usually covered many measurements over a short period, as in stock market data that may change minute by minute or day by day so that trends can be forecast over a few weeks or months, while longitudinal analysis was more appropriate for fewer measurements over a longer period (growth, for example). Your examples extend over 5 years. Obviously my understanding is not right. How would you distinguish the two approaches? Thanks in advance for straightening me out on that question.

Loading...

Reply
Guruprasad says

July 5, 2020 at 11:43 pm

Can you bring a book on Analysing Time Series, it will great help….

Loading...

Reply
Ernest Ng'ang'a says

July 5, 2020 at 11:20 pm

Great post. How do you design a data collection sheet to allow one to collect a time series data?

Loading...

Reply