What is the Coefficient of Determination?
The coefficient of determination measures how well a linear regression model explains the variation in the outcome variable. It is a goodness-of-fit measure that evaluates how closely the model’s predicted values match the actual data. A higher value means the model accounts for more of the outcome’s variability.
The coefficient of determination is commonly referred to as R-squared or R². Its value ranges from 0 to 1:
- 0 means the model does not explain any of the variation in the outcome.
- 1 means the model perfectly predicts the outcome.
The coefficient of determination for the regression model on the left is 15%, and for the model on the right it is 85%. When a linear model accounts for more of the variance, the data points fall closer to the regression line. In practice, you’ll almost never see a model with an R² of 100%. That would mean the fitted values equal the observed data values exactly, and all observations lie perfectly on the regression line.


In general, a higher coefficient of determination indicates that, for a given dataset, the predicted values are closer to the actual values. However, a high value does not necessarily mean the model is appropriate. For instance, it might be an overfit model, inflating the coefficient of determination by capturing noise rather than signal. Or, it might fail to fit curvature and interaction effects, or not include all relevant predictors even if the R² appears high. In short, you’ll still need to assess the assumptions for least squares regression even with a high value.
Coefficient of Determination Formula
There are two coefficient of determination formulas, depending on the type of regression.
In simple linear regression, it is the square of Pearson’s correlation coefficient r:
![]()
In multiple linear regression, the coefficient of determination formula uses the regression model’s sums of squares values:

where SSresidual is the residual sum of squares (RSS) and SStotal is the total sum of squares of the dependent variable around its mean.
The SSresidual / SStotal ratio represents the proportion of variation in the outcome that the model does not explain. Hence, subtracting this ratio from 1 gives the proportion that the model explains.
Example
For example, a model predicting test scores from study hours yields a coefficient of determination of 0.72. This means that 72% of the variation in test scores is explained by the model, while the remaining 28% is due to other factors not included in the model.
« Back to Glossary Index