What is a Representative Sample?
A representative sample is one where the individuals in the sample reflect the properties of an entire population. Use a representative sample when you want to generalize the results from the sample to a population. By studying a representative sample, you can approximate the properties of the population from which it was drawn.
How do you tell if a sample represents a population?
Generally speaking, a representative sample proportionally reflects the attributes of a population. The demographic characteristics of the individuals in the sample must be similar to those in the population: gender, rural, urban, religion, marital status, income levels, etc. However, the relevant attributes depend on your study area. For example, if it’s a health study, you might need to consider other aspects such as health habits, BMI, blood pressure, and so on.
Each study needs to define the target population it wants the sample to represent. You’ll need to do some research to understand the population. During the course of the study, the researchers will learn about the people in the target population.
Read on to see examples of using representative samples, obtaining them, evaluating their representativeness, and knowing the statistical procedures you can use with them.
Related post: Populations vs. Samples: Uses and Examples
Using A Representative Sample to Learn About a Population
Researchers usually want to learn about a population. After all, if you’re studying opinions, attitudes, characteristics, or the effects of a new medication, generalizing the results to an entire population is much more valuable than understanding only the relatively few participants in the study. If you can’t generalize the results, they apply only to that specific sample. A representative sample makes generalization possible.
Unfortunately, populations are usually too large to measure fully. Consequently, researchers must use a manageable subset of that population to learn about it.
Inferential statistics are procedures that use a sample to infer the properties of a population. However, these methods are only valid when the sample resembles the population—a representative sample. Conversely, when your sample doesn’t look like the population you’re studying, you can’t trust that the sample results will generalize to the population.
For all these reasons, researchers strongly prefer obtaining representative samples whenever possible. In fact, it’s a crucial part of experimental design. However, this type of sample can be the hardest to get.
Statistical inference is the process of using a sample to learn about the population. Learn more about Making Statistical Inferences.
Examples of Using Representative Samples
Suppose you are assessing the approval of a controversial opinion in a state. You define your population as all adults in the state. Unfortunately, it’s impractical to contact all adults. Instead, you need to obtain a representative sample.
Your sample will need to contain individuals who resemble the whole population by including all demographic groups (gender, rural, urban, income levels, etc.) and have them in the same proportions as the whole population. For example, it’s not truly representative if you have too many rural participants, or males, etc.
After collecting your sample, you can administer the survey. The proportion in your sample who approve of the opinion is an unbiased estimate of the population proportion.
Other examples of using representative samples include the following:
- Election polling for a particular jurisdiction.
- Surveys of a specific profession, such as medical doctors.
- Literature preferences of Master’s level English students.
- Income distribution among farmers in a particular country.
- Vaccine effectiveness in healthy adults.
Defining the Population for a Representative Sample
Notice how all these examples specify what you’re measuring and define a population. Crucially, to obtain a representative sample, the researchers must first have a clear definition of the population. This definition states who the researchers are learning about.
In other words, what population should the sample represent? You can’t have a representative sample if you don’t know which population it should look like! Additionally, when you generalize the sample results, you need to identify the population about which you are inferring the properties.
Researchers can define the population to meet the needs of their study. For example, I once read an article for a study that defined its population as adult Swedish women (with specific age requirements for inclusion) who have osteoporosis but are otherwise healthy.
After defining your population and drawing a representative sample, you can measure your variables of interest and then generalize them to the population with the appropriate margins of error.
How to Draw a Representative Sample
Representative samples are the best type for researchers. Unfortunately, they’re also more expensive and time-consuming to collect than non-representative samples. If you don’t care about generalizing the results, you can use convenience sampling. By definition, those samples are cheaper and easier to collect. However, their results apply only to that specific sample, limiting their usefulness.
For a representative sample, you can’t obtain only the easy-to-find participants. Instead, you need to include a full spectrum of the population, including the hard-to-contact folks.
Consequently, you must use a specialized sampling method to obtain a representative sample.
There are a variety of sampling methods that can produce a representative sample. In this post, I’ll summarize the most common technique—simple random sampling. Below are links to more detailed posts about various sampling methods.
For simple random sampling, you’ll need to compile a complete list of the population, known as the sampling frame. This list includes all people in the population and does not contain any individuals who are not in the population. The process of creating the list by itself can entail lots of work!
Then you draw a sample from this list using a method that gives everyone an equal chance of being selected. Simple random sampling is the most common, but there are other techniques.
After obtaining your sample, the complications continue. Thanks to the random sampling, your participants will tend to be scattered geographically and include those who are harder to reach, increasing administrative and travel costs.
While it is more challenging, getting a representative sample increases the external validity of your study, which is the ability to generalize the results beyond the sample.
Learn about Sample Size Essentials: The Foundation of Reliable Statistics.
Other Representative Sampling Methods
Read an Overview of Different Types of Sampling Methods in Research to learn about other methods for obtaining a representative sample, or go straight to a specific process:
Another question about representative samples is, how large should yours be? A power analysis can help you out! Learn How to Calculate Sample Size Needed for Power.
Evaluating the Representativeness of a Sample
The population characteristics that a representative sample must approximate depend on the research topic. In a nutshell, your sample’s demographics should roughly match the population’s. When you know something about the population’s demographic characteristics, possibly through other research, you can evaluate how well your sample fits them. If your sample and the population have similar properties, your confidence in the validity of generalizing your results increases.
When you use one of the representative sampling methods, what can cause a sample to look different from the population?
Statisticians refer to the difference between a sample estimate and the population value as sampling error. Sample estimates will never precisely equal the population value. For example, the proportion of respondents agreeing with an opinion in a representative sample won’t equal the population value exactly. Some degree of sampling error is unavoidable.
There is a wide range of potential issues that I’ll assign to two groups, systematic and random errors.
Systematic Errors
These errors usually happen because of a problem with the study or its representative sampling method. Perhaps the sampling frame was incomplete or contained members of other populations? Or the selection method was flawed? Perhaps the recruitment method failed to connect with certain types of people? These errors can cause bias. Bias occurs when the sample estimates are systematically too high or too low.
To help make your sample more representative, you can reduce systematic errors by finding and correcting procedural problems.
Additionally, nonrandom missing data can be a form of systematic error. Learn more about Missing Data Overview: Types, Implications & Handling.
Random Errors
Representative sampling methods use some form of random sampling. The randomness helps prevent bias. However, randomness also guarantees that your sample won’t be 100% representative of the population. Chance inexorably causes some error because the likelihood of obtaining just the right sample that matches the population is practically zero. In rare cases, the sample can differ greatly from the population due to chance alone.
However, because these errors are random, they produce sample values equally likely to be higher or lower than the population value. Hence, they are unbiased but affect the estimate’s precision.
The key way to reduce random error and make your sample more representative is by increasing the sample size.
Sampling error is always an issue. Learn more detail about sampling error, what causes it, and how to manage it.
Analyzing Representative Samples
After obtaining a representative sample, what can you do with it? Using inferential statistics, you have a variety of options. Remember that while your sample statistics are unbiased estimates of the population parameters, there is a degree of error associated with them. For example, the sample mean won’t equal the population mean exactly. Inferential statistics incorporate this error into the results.
Below are summaries of the broad types of inferential procedures you can use with representative samples. Please click the links to learn more about the procedures.
Learn about inferential statistics vs. descriptive statistics.
- Hypothesis Testing: Uses representative samples to assess two mutually exclusive hypotheses about a population. Statistically significant results suggest that the sample effect or relationship exists in the population after accounting for sampling error.
- Confidence Intervals: A range of values likely containing the population value. This procedure evaluates the sampling error and adds a margin around the estimate, giving an idea of how wrong it might be.
- Margin of Error: Comparable to a confidence interval but usually for survey results.
Comments and Questions