What is Sampling Bias?
Sampling bias in statistics occurs when a sample does not accurately represent the characteristics of the population from which it was drawn. When this bias occurs, sample attributes are systematically different from the actual population values. Hence, sampling bias produces a distorted view of the population. Sampling bias often involves human subjects, but it can also apply to samples of objects and animals. Medical researchers refer to this problem as ascertainment bias.
Sampling bias often exists when population members have differing probabilities of participating. In other words, the study is more likely to select specific subgroups or people with particular attributes than others.
In everyday language, “bias” has a negative connotation. However, in statistics, bias indicates a systematic tendency for a sample statistic to over or underestimate a population parameter. In many cases, challenges in collecting representative samples or design oversights cause sampling bias rather than intentional deception. This problem is a distinct concept from sampling error. Bias relates to accuracy, while error relates to precision. Learn about the differences between accuracy vs. precision.
Researchers can intentionally oversample subpopulations to obtain better estimates of those subgroups. When analysts understand the nature of the bias, they can use sample weighting to find unbiased population estimates.
When you don’t understand the nature and degree of the sampling bias, it reduces the external validity of the research study. This problem limits generalizations from the sample to the population. Analysts can generalize findings only to populations that are like their sample.
Sampling bias is a subtype of selection bias. Learn more about Selection Bias: Definition and Examples.
Causes of Sampling Bias
The study design frequently causes sampling bias. Consequently, identifying the source of sampling bias requires you to assess the nuts and bolts of the study. How does a study select its subjects? Are particular subgroups more or less likely to participate?
Let’s take a look at several of the many potential causes.
Probability and non-probability sampling methods are two broad approaches for drawing samples from a population.
Probability methods are rigorous attempts to draw representative samples. When all population members have equal probabilities of inclusion, the chances for sampling bias are significantly lower, but real-world complications can intrude.
For example, you might think your sampling frame contains all population members, but it might not. If it is incomplete, perhaps missing certain types of population members, your sample will be biased despite your best efforts.
Alternatively, the subjects you contact can accurately represent the population, but those who participate and complete the study might not. For example, subjects with particular characteristics might be less likely to agree to participate and/or more likely to drop out before the project finishes. For instance, sick and injured people might be less likely to participate in an exercise study.
Conversely, studies that use non-probability sampling methods have a more considerable risk for sampling bias. These sampling approaches use convenience, researcher judgment, and the subjects themselves to recruit participants. Clearly, these studies are more likely to include some recruits than others, causing them to misrepresent the population.
Sampling Bias Examples
Sampling bias can occur for many reasons. Let’s look at some types of sampling bias!
Potential recruits with particular attributes are more likely to participate in the study. This type of sampling bias overrepresents subjects with those attributes.
For example, suppose you’re conducting a survey about local water quality. People already interested in this topic are more likely to respond and, thus, be overrepresented in the results. This group likely has opinions that differ from the general population.
This form of sampling bias is the opposite of the previous bias. Potential subjects with specific characteristics are less likely to participate or may drop out before the study ends.
For example, subjects with health issues might not be able to complete a study for a physical fitness program. Consequently, the program appears more effective in the sample than in the population.
Learn more about nonresponse bias, how to reduce it during a study, and how to adjust for it afterward in: Nonresponse Bias: Definition & Reducing.
This type of sampling bias occurs when a study evaluates only participants who have successfully passed a selection process and excludes those who did not.
Studies that assess a sample of existing companies are a classic example of this bias. By focusing on the financial status of active companies, these studies don’t include those that have gone out of business. Consequently, the sample estimate of businesses’ financial health will be rosier than the population of firms, including those that have gone under.
Learn more in-depth about Survivorship Bias Examples and Avoiding It.
Symptom Based Sampling
Diagnosed conditions and referrals for treatment tend to have more severe symptoms than milder forms that are not diagnosed. This kind of sampling bias occurs in medical and psychological studies.
For example, referrals for reading comprehension problems typically are the more severe cases. However, there might be many more students with milder forms that struggle but are not diagnosed. Consequently, the sample overestimates the severity of the problem and underestimates the frequency of milder cases.
This form of sampling bias underrepresents hard-to-find subgroups. The fact that they are difficult to contact means they are less likely to be included in the sample.
For example, homeless people are unlikely to appear on various lists and won’t have an address or phone number. Consequently, samples are unlikely to include them.
Learn more about Undercoverage Bias: Definition & Examples.
This kind of sampling bias happens when advertising is likely to attract subjects with specific characteristics.
For example, a study that advertises a fitness improvement program is more likely to find subjects who are already motivated to get fit. Hence, the program might be more effective in this sample than in the general population.
Avoiding Sampling Bias
The previous examples of sampling bias illustrate a few of the causes. Each study has potential avenues for bias. I hope the examples highlight the importance of thinking critically about these issues.
There are no cookie-cutter answers or guaranteed fixes. You’ll need to use your subject-area knowledge and evaluate how particular subgroups might be over and underrepresented. There is a multitude of context-sensitive possibilities.
Below are some general approaches to consider:
- Use probability sampling based on a sampling frame that includes all population members.
- Identify hard-to-reach subgroups and make an extra effort to include them.
- Reduce barriers that might exclude some participants, such as having flexible hours and multiple locations.
- Contact recruits that don’t respond or drop out of the study.
- Find subjects using a process that doesn’t entirely depend on them passing a test, satisfying criteria, being diagnosed, or responding to an ad.
Research is complex and challenging. In many cases, avoiding all sources of sampling bias is impossible. However, you can take steps to minimize it. Even when you can’t eliminate it, understanding sampling bias can help you better interpret the results. For example, if you advertise for an intervention, you might realize that your sample represents the more motivated individuals rather than the general population.
Cognitive bias can also affect research results by influencing both the participants and the researchers. Learn more about Cognitive Bias: Definition & Examples.