An endogenous variable is a variable in a statistical model whose value is influenced by other variables within the model itself. This internal dependence can create bias in estimates, especially in regression analysis.
Endogeneity often arises from omitted variables, measurement error, or simultaneous causality. While confounding variables contribute to endogeneity, they are not exactly the same: a confounder is an outside factor that distorts the relationship between variables, whereas an endogenous variable is one whose value is determined (at least partially) by relationships within the system being modeled. Endogeneity concerns variables that appear inside the model, and addressing it is critical for making valid causal inferences. These variables violate an assumption of linear regression and can distort the results.
For example, in a model studying how education affects income, education might be endogenous if there are unmeasured factors, like motivation or family background, that influence both education and income. Although motivation is not included in the model, its existence causes the education variable (which is in the model) to be correlated with the model’s error term. That correlation makes education endogenous, which can bias the estimated effect of education on income.
Endogeneity can also arise through simultaneity, where two variables influence each other at the same time. For instance, in a model examining supply and demand for a product, price and quantity are determined simultaneously. Price affects the quantity demanded, and quantity affects the market price. In such cases, treating price as an independent predictor without addressing simultaneity would result in endogeneity.
« Back to Glossary Index