Heterogeneity is defined as a dissimilarity between elements that comprise a whole. When heterogeneity is present, there is diversity in the characteristic under study. The parts of the whole are different, not the same. It is an essential concept in science and statistics. Heterogeneous is the opposite of homogeneous.
In chemistry, a heterogeneous mixture has a composition that varies. For example, oil and vinegar, sand and water, and salt and pepper are all heterogeneous mixtures. Multiple samples of these mixtures will contain different proportions of each component.
In statistics, heterogeneity is a vital concept that appears in various contexts, and its definition varies accordingly. Heterogeneity can indicate differences within individual samples, between samples, and between experimental results in a meta-analysis. It also applies to an assumption violation regarding errors in linear models. This post focuses on these statistical definitions of heterogeneity and shows you how to identify and test it statistically.
Heterogeneity in Individual Samples
When you take a sample from a population, you can assess its heterogeneity. Do the individual items in a sample tend to be relatively similar (homogeneous) or dissimilar? Do your data contain variability? If so, how much?
You can use a measure of dispersion to assess heterogeneity. For example, higher standard deviation values indicate the sample is more diverse. Conversely, lower values indicate the items tend to be similar. When there is perfect homogeneity, all the objects in the sample are the same, and the standard deviation equals zero.
You can also plot your data to evaluate heterogeneity. In the histogram below, Group C is more diverse than Group A because the items in Group C spread out further. This broader spread represents greater heterogeneity.
Heterogeneity Between Samples
You can also consider whether the properties of different samples, or groups in your data, are heterogeneous. When you collect multiple samples, do they tend to be similar or different? In this context, you need to be careful to define the properties that you are assessing. Some properties of the different samples can be heterogeneous, while others are homogeneous. In this section, I show you how to assess heterogeneity between samples for continuous and categorical data.
With continuous data, you can assess the heterogeneity between sample means and variability. Using boxplots, you can display their characteristics and determine whether they differ.
In the boxplot below, the groups have roughly homogeneous means and standard deviations.
The samples below have heterogeneous means but homogenous variability.
In the graph below, the groups have the homogenous means but heterogeneous variability.
While these graphs visually depict heterogeneity, you can test these properties using statistical hypothesis tests.
For instance, ANOVA compares the means of multiple samples. It tests the heterogeneity of group means. However, the F-test ANOVA assumes that the variability of the groups are equal. In other words, you can use ANOVA when group means are heterogeneous, but the variability should be homogeneous.
Related post: Boxplots vs. Individual Value Plots for Comparing Groups
For categorical data, you can assess the heterogeneity of the categories. We’ll consider M&M candies for these examples, which have six colors: brown, yellow, green, red, orange, and blue.
Again, note the difference between heterogeneity within a sample versus between samples.
A single M&M sample will be homogeneous if it contains only one color. The sample grows increasingly heterogeneous as the number of colors increases.
However, for multiple samples, homogeneity occurs when the number and proportions of colors are the same between them. Heterogeneous batches will have different color ratios.
The pie charts below display pairs of homogeneous and heterogeneous samples of M&M colors.
You can test this statistically for categorical data using the chi-square test for homogeneity. When your p-value is low, reject the null hypothesis (homogeneity) and conclude that the samples are heterogeneous. The differences between the category proportions are dissimilar enough to be statistically significant.
The calculations for the chi-square test of homogeneity are the same as the test for independence. The difference between them lies in the hypotheses, testing logic, and sampling methods.
Related post: Chi-square Test of Independence
Heterogeneity Between Scientific Studies
When you consider a series of scientific studies that all attempt to answer the same research question, you can assess the heterogeneity of their results. Meta-analysis does more than simply report the mean effect size for a set of studies. This type of analysis also considers the variability of effect sizes from the individual studies around the overall mean effect—which is where heterogeneity comes in!
Ideally, the study results are all similar (i.e., homogeneous). When that’s true, they’re all painting the same picture, giving you confidence about the real effect. However, if the results are heterogeneous, you’ll need to proceed carefully and understand the differences between the findings. You’ll also want to evaluate the degree of heterogeneity. Do the studies differ greatly, or only slightly?
I’ll show you a graphical and numeric way to evaluate heterogeneity in a meta-analysis.
A forest plot, also known as a blobbogram, is a specialized plot designed to display the results of different studies in a meta-analysis. These plots depict effect sizes on the horizontal axis and include a reference line for no effect. For each experiment, it displays a point estimate for the effect and a confidence interval (CI). You can use a forest plot to evaluate heterogeneity in a meta-analysis.
The forest plot below displays 13 studies and their estimates of the effectiveness of a Bacillus Calmette-Gúerin (BCG) vaccine in preventing tuberculosis (TB).
Overall, the studies favor the treatment group that received the vaccine over the control group which did not receive it. However, there are differences between the studies. Studies have CIs of different widths. Some CIs include the null value of zero (no effect), while others do not. One study’s point estimate even favors the control group! Several other estimates fall right on the no effect line.
While the graph displays heterogeneity in the meta-analysis, we need to quantify it. This necessity brings us to the I2 statistic, which I’ve circled on the forest plot.
Related post: Control Groups in Experiments
The I2 statistic quantifies the degree of heterogeneity in a series of studies within a meta-analysis. This statistic is a percentage that ranges from 0 – 100%. It indicates the proportion of variation around true effect sizes other than sampling error.
Statisticians commonly use the following benchmark values to assess the degree of heterogeneity:
- 25%: Small
- 50%: Moderate
- 75%: Large
On the forest plot above, the value is 92.22%. These studies have considerable heterogeneity. We must proceed with caution when assessing the overall effectiveness of the BCG vaccine. They are not telling a consistent story!
Heterogeneous Errors in Linear Models
Linear models assume that the errors are homogeneous. When you plot the residuals, you want to see dispersion that remains consistent throughout the entire range. Unfortunately, that’s not always the case. Statisticians refer to heterogeneous residuals as heteroscedasticity, which violates the assumption. The residual plot below shows this condition.
Notice how the spread of the residuals increases as you move to higher fitted values. Fortunately, there are several ways to address this condition.