Covariance vs correlation both evaluate the linear relationship between two continuous variables. While this description makes them sound similar, there are stark differences in how to interpret them.
Although these statistics are closely related, they are distinct concepts. How are they different?
In this post, learn about the differences between covariance vs correlation and what you can learn from each.
Differences between Covariance vs Correlation
Covariance and correlation both assess the direction of the linear relationship between variables. However, correlation also tells us about its strength because it’s results are standardized across different units and datasets.
The covariance formula does not include standardization. Consequently, the values can range from negative to positive infinity, and the data’s measurement scales impact the results. Interpretation is difficult!
Correlation standardizes the results so it always falls between -1 and 1, and the results do not depend on the data’s scale. Interpretation is easy!
In summary, correlation is far more interpretable than covariance because it allows us to assess the direction and strength of relationships across different units.
The table below summarizes the differences between covariance vs correlation. Then, I provide an example that brings these differences to life using real data!
Feature | Covariance | Correlation |
Interpretation | Directional relationship but not strength | Direction and strength |
Range of Values | -∞ to +∞ | -1 to + 1 |
Scales Effects | Sensitive to variable scale. | Insensitive. |
Standardization | Unstandardized | Standardized |
Dimensionality | Unit is the product of two variables. | Dimensionless. No units. Enables comparisons. |
Example of Covariance vs Correlation
Now, let’s examine the interpretability difference between covariance vs correlation using real data. This example highlights covariance’s sensitivity to scale and difficulty interpreting it.
I have a dataset containing the heights and weights for 88 research participants. I converted the original metric measurements (meters and kilograms) to imperial units (feet and pounds). They’re the same subjects—I just converted the units. Here’s the Excel file with the dataset if you want to try it: Covariance.
Excel calculates the height and weight covariance for both scales as the following:
- Metric: 0.57
- Imperial: 4.09
Covariance produces different values for each scale even though the data originate from the physical properties of the same people. They were just measured differently. The differing results are not helpful! That’s a crucial problem of covariance vs correlation.
How strong is that relationship according to the covariances? It’s impossible to say! Both values are positive, so we know there is a positive relationship. But we don’t have a standard range to help interpret the two different values.
In contrast, the relationship between height and weight produces the same exact correlation coefficient of 0.71 using both metric and imperial units. A consistent answer is nice!
Additionally, because we can place 0.71 in the standard -1 to +1 range for correlation, we know a moderately strong positive correlation exists between height and weight. Crucially, these results don’t depend on our measurement system! Here’s a graph of the data showing the positive trend as found by both covariance vs correlation.
Learn more about Interpreting Correlation Coefficients.
Summary
Generally, you’ll report correlations rather than covariances because they’re far more interpretable. However, despite those shortcomings, covariance has specialized applications in various fields.
In this post, I show you the real-world interpretability differences between covariance vs correlation. For a more nuts and bolts look at how these differences stem from their respective formulas, please read my following posts:
Eric McIntyre says
I would be interested in an article on Causal Inference.
( I am reading the book CAUSAL INFERENCE. by Paul R. Rosenbaum (2023 MIT Press), an easily accessible, nontechnical introduction to the subject.
A quote from the front matter:
“To find out what happens to a system when you interfere from it, you have to interfere with it” -George E. P. Box “The Use and Abuse of Regression” )
Jim Frost says
Hi Eric
Causality is a fascinating topic in statistics. It can be a surprisingly difficult thing to pin down. There is no statistical test that directly assesses causality. Instead, you really need to use specialized experimental procedures to infer causation, some of which does involve “interfering” with a system. I have written a number of articles on the topic. Below are some of them.
Causation in Statistics: Hill’s Criteria
Correlation vs. Causation
Randomized controlled trials