Excel can calculate correlation coefficients and a variety of other statistical analyses. Even if you don’t use Excel regularly, this post is an excellent introduction to calculating and interpreting correlation.
In this post, I provide step-by-step instructions for having Excel calculate Pearson’s correlation coefficient, and I’ll show you how to interpret the results. Additionally, I include links to relevant statistical resources I’ve written that provide intuitive explanations. Together, we’ll analyze and interpret an example dataset!
Before proceeding, check that Excel’s Data Analysis ToolPak is installed. On the Data tab, find Data Analysis, as shown below.
If you don’t see Data Analysis, install that ToolPak. Learn how to add it in my post about using Excel to perform t-tests. It’s free!
Correlation Basics
When a correlation between a pair of variables exists, it indicates that as one variable changes, the other variable tends to change in a particular direction. There are two important properties to evaluate, direction and strength.
The direction of the correlation corresponds to the coefficient’s sign, positive or negative. As one variable increases, does the other variable tend to increase or decrease?
- A positive correlation indicates that as one variable increases, the other tends to increase.
- A negative correlation signifies that as one variable increases, the other tends to decrease.
The values indicate the strength of the correlation and they can range from -1 to +1. The closer the value is to -1 or 1, the stronger the relationship. Values near zero represent no relationship.
For a more thorough understanding of correlations, please read my post about Interpreting Correlation Coefficients. I provide examples, graphs that display negative and positive relationships of different strengths, important caveats and warnings, and so on.
Learn how to calculate correlation in my post, Correlation Coefficient Formula Walkthrough.
Example Height and Weight Dataset for Correlation
This height and weight dataset is the same one I use for my post about Descriptive Statistics in Excel, so you’ll see that information in the Excel file as well. Download the Excel file that contains the data for this example: HeightWeightCorr.
Before performing statistical analyses, it’s always helpful to graph your data. Below is the scatterplot of the Height and Weight dataset.
The chart displays an upward slope. In other words, as a person’s height increases, their weight also tends to increase. We have a positive correlation.
How strong is the relationship?
If you pick an individual height on the scatterplot, say near 1.5M, you’ll see weights that range from about 35kg to nearly 60kg. This spread indicates that the correlation is not perfect. You can also find taller individuals that weigh less than shorter individuals. For example, just by looking at data points on the graph, I see an individual at about 1.63M who weighs nearly 30kg less than a shorter person at 1.52M.
While it’s not a perfect association, there’s a definite tendency for taller people to weigh more. Visually, it appears to be a moderately strong association. That’s a subjective assessment.
Let’s calculate the correlation using Excel to get an objective measure!
Related post: Guide to Data Types and How to Graph Them
Using Excel to Calculate Correlation
For this example, we’ll assess two variables, the height and weight of preteen girls. I collected these data during an actual experiment. To use the correlation feature in Excel, arrange your data in columns or rows. I have my data in columns, as shown in the snippet below.
In Excel, click Data Analysis on the Data tab, as shown above. In the Data Analysis popup, choose Correlation, and then follow the steps below.
Step-by-Step Instructions for Excel’s Correlation Analysis
- Under Input Range, select the range for the variables that you want to analyze. You must include at least two variables for correlation and they must form a contiguous block in your worksheet. Excel performs all possible pairwise correlations for the variables you specify.
- In Grouped By, choose how your variables are organized. I always include one variable per column because this format is standard across statistical software. Alternatively, you can include one variable per row.
- Check the Labels in first row checkbox when you have meaningful variable names in row 1. This option makes the output easier to interpret.
- In Output options, choose where you want Excel to display the results. I like to put them in a separate worksheet.
- Click OK.
For our example, fill in the dialog box as shown below.
Interpreting Excel’s Correlation Results
After Excel creates the statistical output, I autofit the columns for clarity.
We’re interested in the relationship between Weight kg and Height M. The two perfect values of 1 are each variable correlated with itself!
The height by weight correlation is 0.705. The positive coefficient is consistent with the upward slope we saw on the scatterplot. The strength of the association is 0.705, which is moderate. It definitely exists. Taller people tend to weigh more. However, as we saw on the graph, it’s not close to being a perfect correlation. A tendency exists, but there are exceptions to the trend.
For the height and weight data, there’s undoubtedly an overall trend, or tendency, for taller people to weigh more.
In this post, I use Excel to calculate Pearson’s correlation coefficient. To see how I use Excel to calculate Spearman’s rank order correlation, read my post about Spearman’s Correlation Explained.
Comments and Questions