Internal and external validity relate to the findings of studies and experiments.
Internal validity evaluates a study’s experimental design and methods. You must have a valid experimental design to be able to draw sound scientific conclusions.
External validity assesses the applicability or generalizability of the findings to the real world. So, your study had significant findings in a controlled environment. But will you get the same results outside of the lab?
In this post, learn more about internal and external validity, how to increase both of them in a study, threats that can reduce them, and why studies high in one type tend to be low in the other.
Learn more about Experimental Design: Definition, Types, and Examples.
If you’re interested in the validity of test scores and measurements rather than experiments, read my post Validity.
Related post: Reliability vs Validity
Internal validity is the degree of confidence that a causal relationship exists between the treatment and the difference in outcomes. In other words, how well did the researchers perform the study? How likely is it that your treatment caused the differences in results that you observe? Are the researcher’s conclusions correct? Or can changes in the outcome be attributed to other causes?
Establishing interval validity involves assessing data collection procedures, the reliability and validity of the data, the experimental design, and even things such as the setting and duration of the experiment. It could involve understanding events and natural processes that occur outside of the investigation. In other words, it’s the whole thing. Does the entirety of the experiment allow you to conclude that the treatment causes the differences in outcomes?
Studies that have a high degree of internal validity provide strong evidence of causality. On the other hand, studies with low internal validity provide weak evidence of causality.
How to Increase Internal Validity
Typically, highly controlled experiments improve internal validity. Experiment with the following features tend to have the highest internal validity:
- They occur in a lab setting to reduce variability from sources other than the treatment.
- Use random sampling to obtain a sample that represents the population.
- Use random assignment to create control and treatment groups that are equivalent at the beginning.
- Include a control group to understand treatment effects.
- Use blinding and other protocols that reduce the influence of extraneous factors, such as knowledge about the treatment and experimenter bias.
Removing these properties, such as moving from the lab to the real world, not being able to randomize, or not having a control group reduces internal validity.
Internal validity relates to causality for a single study. For the study in question, did the treatment cause changes in the outcomes? Internal validity does not address generalizability to other settings, subjects, or populations. It only assesses causality for one study. We’ll get into the other issues when we talk about external validity.
Threats to Internal Validity
Threats to internal validity are types of confounding variables because they provide alternative explanations for changes in outcomes. They are threats because they make us doubt causality. The real reason for apparent treatment effects might be these potential threats.
For example, imagine a weight loss program where the researchers measure the subjects’ weights at the beginning, conduct the program, and then measure weights at the end. If the intervention causes weight loss, you’d expect to see decreases between the pretest and posttest.
However, there are various threats to attributing a causal connection between the weight loss program and the changes in weights. The following items are threats to internal validity.
An outside event occurred between the pretest and posttest that affected the outcomes and can reduce internal validity. Perhaps a fitness program became popular in town, and many subjects participated. It might be the fitness program that caused the weight loss rather than the weight loss program we’re studying.
The change between pretest and posttest scores might represent a process that occurs naturally over time and, thus, raises questions about internal validity. Imagine if instead of a weight loss program, we are studying an educational program. If the posttest scores are higher at the end, we might be observing regular knowledge acquisition rather than the program causing the increase. If it’s a natural process, we would have seen the same change even if the subjects did not participate in the experiment.
The pretest influences outcomes by increasing awareness or sensitivity among test takers. Suppose that the mere fact of weighing the subjects makes them more weight conscious and increases their motivation to lose weight.
The change between tests is an artifact of a difference between the pretest and posttest assessment instruments rather than an actual change in outcomes. This threat to internal validity can involve a change in the instrument, different instructions for administering the test, or researchers using different procedures to take measurements. If the scale stops working correctly at some point after the pretest and displays lower weights in the posttest, the subjects’ weights appear to decrease.
Mortality refers to an experiment’s attrition rates amongst its subjects—not necessarily actual deaths! It becomes a problem when subjects with specific characteristics drop out of the study more frequently than other subjects. If these characteristics are associated with changes in the outcome variable, the systematic loss of subjects with these characteristics can bias the posttest results.
For example, in an experiment for an educational program, if the more dedicated learners have more extracurricular activities, they might be more likely to drop out of the study. Losing a disproportionate number of dedicated learners can deceptively reduce the apparent effectiveness of an education-al program. This threat to internal validity is higher for studies that have relatively high attrition rates.
Regression to the mean. If you get an unusual average in the pretest, the group will tend to regress to the mean in the posttest. Suppose we’re assessing an education program and the pretest produces unusually low means. Regression to the mean will tend to cause the posttest to be higher even if the intervention doesn’t cause an increase.
External validity relates to the ability to generalize the results of the experiment to other people, places, or times. Scientific studies generally do not want findings that apply only to the relatively few subjects who participated in the study. Instead, studies want to be able to use the experimental results and apply them to a larger population. This is a key goal of inferential statistics.
For example, if you’re assessing a new medication or a new educational program, you don’t want to know that it’s effective for a handful of people. You want to apply those results beyond just the experimental setting and the particular individuals that participated. That’s generalizability—and the heart of the matter for external validity.
Unlike internal validity, external validity doesn’t assess causality and ruling out confounders.
There are two broad types of external validity—population and ecological.
Population validity relates to how well the experimental sample represents a population. Sampling methodology addresses this issue. If you use a random sampling technique to obtain a representative sample, it greatly helps you generalize from the sample to the population because they are similar. Population validity requires a sample that reflects the target population.
On the other hand, if the sample does not represent the population, it reduces external validity and you might not be able to generalize from the sample to the population.
Ecological validity relates to the degree of similarity between the experimental setting and the setting to which you want to generalize. The greater the similarity of key characteristics between settings, the more confident you can be that the results will generalize to that other setting. In this context, “key characteristics” are factors that can influence the outcome variable. Generalizability requires that the methods, materials, and environment in the experiment approximate the relevant real-world setting to which you want to generalize.
Threats to external validity are differences between experimental conditions and the real-world setting. Threats indicate that you might not be able to generalize the experimental results beyond the experiment. You performed your research in a particular context, at a particular time, and with specific people. As you move to different conditions, you lose the ability to generalize. The ability to generalize the results is never guaranteed. This issue is one that you really need to think about. If another researcher conducted a similar study in a different setting, would that study obtain the same results?
The following practices can help increase external validity:
- Use random sampling to obtain a representative sample from the population you are studying.
- Understand how your experiment is similar to and different from the setting(s) to which you want to generalize the results. Identify the factors that are particularly relevant to the research question and minimize the difference between experimental conditions and the real-world setting.
- Replicate your study. If you or other researchers replicate your experiment at different times, in various settings, and with different people, you can be more confident about generalizability.
Internal vs. External Validity: The Relationship Between Them
There tends to be a negative correlation between internal and external validity in experiments. Experiments that have high internal validity tend to have lower external validity. And, vice versa.
Why does this happen?
To understand the reason, you must think about the experimental conditions that produce high degrees of internal and external validity. They’re diametrically opposed!
To produce high internal validity, you need a highly controlled environment that minimizes variability in extraneous variables. By controlling the environmental conditions, implementing strict measurement methodologies, using random assignment, and using a standardized treatment, you can effectively rule out alternative explanations for differences in outcomes. That produces a high degree of confidence in causality, which is high internal validity.
However, that artificial lab environment is a far cry from any real-world setting! To have high external validity, you want the experimental conditions to match the real-world setting. Observational studies are much more realistic than a lab setting. You experience the full impact of real-world variability! That creates high external validity because the experimental conditions are virtually the real-world setting. However, as I explain in my article about observational studies, that type of study opens the door to confounding variables and alternative explanations for differences in outcomes—in other words, lower internal validity!
So, what’s the answer?
Replication! Researchers can conduct multiple experiments in different places and use different methodologies—some true experiments in a lab and other observational studies in the field. This point reiterates the importance of replicating studies because no single study is ever enough.
As you can see, planning an experiment so you can draw valid conclusions and apply them to other settings requires a thorough assessment. Failure to do the appropriate planning for both internal and external validity can cause your experiment or study to produce results that you cannot trust!
Internal and external validity, San Jose State University
Glenn H. Bracht and Gene V. Glass, External Validity of Experiments, American Educational Research Journal, Vol. 5, No. 4 (Nov., 1968), pp. 437-474.
Comments and Questions