• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar
  • My Store
  • Glossary
  • Home
  • About Me
  • Contact Me

Statistics By Jim

Making statistics intuitive

  • Graphs
  • Basics
  • Hypothesis Testing
  • Regression
  • ANOVA
  • Probability
  • Time Series
  • Fun
  • Calculators

Correlation Coefficient Formula Walkthrough

By Jim Frost 1 Comment

Pearson’s correlation coefficient formula produces a number ranging from -1 to +1, quantifying the strength and direction of a relationship between two continuous variables. A correlation of -1 means a perfect negative relationship, +1 represents a perfect positive relationship, and 0 indicates no relationship.

In this post, you’ll learn about the correlation coefficient formula and gain insight into how it works. Then we’ll work through an example calculation so you learn how to find the correlation coefficient.

For more information specifically about interpretations, read my post, Interpreting Correlation Coefficients.

Pearson’s Correlation Coefficient Formula

The equation might initially seem daunting, but we’re here to demystify it.

So, let’s take a look at the formula itself. The Greek symbol ρ (rho) represents Pearson’s correlation coefficient.

The correlation coefficient formula is the following fraction:

Correlation coefficient formula: r = ∑ [(xi - μx) * (yi - μy)] / (n-1) * sx * sy

Where:

  • Xᵢ and Yᵢ represent the individual values of variables X and Y.
  • X̄ and Ȳ denote their respective means.
  • N represents the number of observations.
  • sx and sy represent the sample standard deviations of X and Y.

By understanding the correlation formula and how it works as a fraction, you can gain insight into how it assesses the data.

You can also use this formula to calculate Spearman’s correlation that uses ranks rather than raw data values.

To find the correct solution, you must use the correct order of operations in the formula. If you need a refresher on that, read my post PEMDAS Explained: Order of Operations in Math.

How the Correlation Coefficient Formula Works

Numerator

The correlation formula works by comparing each variable’s observed values to their means in the numerator, as shown below.

Numerator of the correlation coefficient formula.

The product in the correlation coefficient formula’s numerator produces a greater number of positive values to add to the sum when the following conditions tend to occur:

  • Above-average X values correspond with above-average Y values.
  • Below-average X values correspond with below-average Y values.

A positive sum in the numerator produces a positive correlation.

Conversely, when above-average values for one variable tend to correspond with below-average values of the other, the numerator produces a greater number of negative values to subtract from the total. A negative sum in the numerator produces a negative correlation.

In this manner, the correlation formula assesses the co-variability of two variables around their respective means.

Denominator

Denominator of the correlation coefficient formula.

The denominator of the correlation coefficient formula divides the numerator by the product of the degrees of freedom and the two standard deviations. The denominator is always positive because degrees of freedom and standard deviations are always positive values.

The numerator can be positive or negative but its absolute value can never be larger than the denominator, which is how the equation scales correlation coefficients to fit the range of -1 to +1.

Use my Correlation Coefficient Calculator to find the relationship in your data and graph it!

Covariance vs Correlation

Before working through the correlation coefficient formula, let’s look at how this equation is similar to the covariance formula and the crucial difference.

You find the covariance if you take the correlation coefficient formula’s numerator and only the (n – 1) in the denominator, as shown below.

Illustrates the difference between the covariance and correlation formulas.

Dividing by the extra sXsy bit in the denominator takes you from covariance to correlation. That’s the difference between the two statistical measures. That “extra bit” is the product of the standard deviations of X and Y, and it does two critical things.

First, it takes the -∞ to +∞ covariance range and scales it to the correlation coefficient’s easier-to-interpret -1 to +1 range.

Second, standard deviations use the original data units. Including both SDs in the denominator removes those units from the equation because they’re also in the numerator. Consequently, unlike the covariance, the correlation coefficient formula’s result is unitless and doesn’t change depending on the measurement units.

Suppose you are assessing the relationship between height and weight. If you were to change the height measurements from inches to centimeters, that would affect the covariance but not the correlation. You can even compare correlation coefficients between entirely dissimilar studies.

In summary, the standardized range and unitless nature make correlation far easier to interpret than covariance.

Learn more about Covariance: Definition, Formula & Example and Covariance vs Correlation: Understanding the Differences.

How to Find the Correlation Coefficient Worked Example

Let’s work through an example using the correlation formula to illustrate how to find the coefficient. Suppose we want to evaluate the relationship between the number of hours studied (X) and the test scores (Y) obtained by a group of five students. The data are below.

Example dataset.

For simplicity, I’ll split the calculations between the numerator and denominator and then divide them in the final step.

Numerator

To start, we need to find the mean of both variables to use in the correlation formula.

X̄ = (3 + 5 + 2 + 7 + 4) / 5 = 4.2

Ȳ = (70 + 80 + 60 + 90 + 75) / 5 = 75

Then, follow these steps to calculate the numerator in the correlation coefficient formula:

  1. Calculate the differences between the observed X and Y values and each variable’s mean.
  2. Multiply those differences for each X and Y pair.
  3. Sum those products.

Worksheet that illustrates the calculations for the correlation coefficient formula's numerator.

Notice that the product column contains all positive values because above average X-values correspond with above average Y-values. Corresponding below average values similarly produce positive values because the product of two negatives is a positive.

These positive products produce a positive total for the numerator. So, we know that we’ll have a positive correlation coefficient. We’ll use the total in the numerator of the correlation formula to calculate the coefficient’s value.

Denominator

For the denominator of the correlation coefficient formula, we need to calculate the product of the degrees of freedom, the standard deviation of X, and the standard deviation of Y:

(n – 1) * sx * sy

N is the number of paired observations, usually the number of rows in your dataset without missing values. We have 5 observations, so n – 1 = 4.

I cover how to calculate the standard deviation elsewhere. So, for this example, I’ll have Excel calculate the sample standard deviations for X and Y, which are 1.92 and 11.18, respectively.

We just multiply all these values together for the denominator.

4 * 1.92 * 11.18 = 86.02

Calculating the Correlation

At this point of the correlation coefficient formula, we just divide the numerator by the denominator to find the coefficient!

Final calculation for the correlation coefficient formula that divides the numerator by the denominator.

For these data, the correlation between hours of studying and test scores is 0.99. That’s a strong positive relationship. The more you study, the higher your score. This correlation is unrealistically high, but these are made-up data.

Share this:

  • Tweet

Like this:

Like Loading…

Related

Filed Under: Basics Tagged With: analysis example, formula

Reader Interactions

Comments

  1. Larry says

    November 30, 2023 at 8:18 pm

    Hi, great blog, keep it up, sir. Is the formula above a formula for a sample standard deviation or for the population standard deviation? If the populations from where you draw the values Xi, Yi are infinite, don’t you have issues with the infinite sum converging?

    Loading...
    Reply

Comments and QuestionsCancel reply

Primary Sidebar

Meet Jim

I’ll help you intuitively understand statistics by focusing on concepts and using plain English so you can concentrate on understanding your results.

Read More...

Buy My Introduction to Statistics Book!

Cover of my Introduction to Statistics: An Intuitive Guide ebook.

Buy My Hypothesis Testing Book!

Cover image of my Hypothesis Testing: An Intuitive Guide ebook.

Buy My Regression Book!

Cover for my ebook, Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models.

Subscribe by Email

Enter your email address to receive notifications of new posts by email.

    I won't send you spam. Unsubscribe at any time.

    Buy My Thinking Analytically Book!

    Cover for my book, Thinking Analytically: An Guide for Making Data-Driven Decisions.

    Top Posts

    • F-table
    • Cronbach’s Alpha: Definition, Calculations & Example
    • Z-table
    • How To Interpret R-squared in Regression Analysis
    • Accuracy vs Precision: Differences & Examples
    • Box Plot Explained with Examples
    • Interpreting Correlation Coefficients
    • How to Interpret P-values and Coefficients in Regression Analysis
    • Multicollinearity in Regression Analysis: Problems, Detection, and Solutions
    • Cohens D: Definition, Using & Examples

    Recent Posts

    • Data Collection Methods: Step-By-Step Guide with Examples
    • ANOVA Calculator
    • Positive Predictive Value: Meaning, Formula, and Interpretation
    • Median Absolute Deviation Calculator
    • Median Absolute Deviation: Definition, Finding & Formula
    • Outlier Calculator

    Recent Comments

    • Skata na fas on Comparing Regression Lines with Hypothesis Tests
    • Jim Frost on Comparing Regression Lines with Hypothesis Tests
    • Skata na fas on Comparing Regression Lines with Hypothesis Tests
    • Skata na fas on Comparing Regression Lines with Hypothesis Tests
    • Jim Frost on Pareto Chart: Making, Reading & Examples

    Copyright © 2026 · Jim Frost · Privacy Policy

    %d