What is the Sum of Squares?
The sum of squares (SS) is a statistic that measures the variability of a dataset’s observations around the mean. It’s the cumulative total of each data point’s squared difference from the mean.
Larger values indicate a greater degree of dispersion. However, it is an unscaled measure that doesn’t adjust for the number of data points. Adding new data points to a dataset is virtually guaranteed to cause the SS to increase, and it can never decrease.
In short, simply being larger will cause a dataset to have a higher sum of squares. Consequently, you can’t compare these values between different-sized datasets. Additionally, this statistic is in squared units, further reducing interpretability.
Despite these shortcomings, SS is an invaluable tool for assessing linear models by partitioning their variance. This process breaks down the variability in our data and linear model into distinct components, helping us evaluate our model. More on that shortly!
Let’s move on to understanding the sum of squares formula and how it is the starting point for other variability measures. Then I’ll show you how it’s a fundamental component of least squares regression. Let’s dive in!
Related post: Measures of Variability
Sum of Squares Formula
The sum of squares formula is the following:
Where:
- Σ represents the sum for all observations from 1 to n.
- n is the sample size.
- Xᵢ is an individual data point.
- X̅ (pronounced “X-bar”) is the mean of the data points.
The sum of squares formula provides us with a measure of variability or dispersion in a data set. The process for how to find the sum of squares involves the following:
- Take each data point and subtract the mean from it.
- Square that difference.
- Add all the squared values to the running total.
Notice how the squaring process in the sum of squares formula ensures that it tends to increase with each additional data point. Negative differences are squared, producing a positive value that adds to the total. Only the rare observations that equal the mean exactly contribute zero to the sum.
The formula leaves the statistic in its squared form (i.e., it does not take the square root).
Finally, there is no denominator in the sum of squares formula to divide by the number of observations or degrees of freedom. That’s the unscaled nature of SS. This statistic grows with the sample size.
This sum of squares formula is the starting point for other variability measures that do factor in the sample size. Some also take the square root to use the data’s natural units. These statistics include the following:
SS in Regression Analysis
In regression analysis, the sum of squares (SS) is particularly helpful because it separates variability into three types: total SS, regression SS, and error SS. After explaining them individually, I’ll show you how they work together.
Total Sum of Squares (TSS)
TSS measures the total variability in your data. It’s essentially the sum of squares we discussed earlier, but linear regression applies it to your response variable.
It measures the overall variability of the dependent variable around its mean. Consider it the total amount of variation available for your model to explain.
The total sum of squares formula is the following:
This value is the sum of the squared distances between the observed values of the dependent variable (yi) and its mean (ȳ).
Regression Sum of Squares (RSS)
RSS measures the variability in the model’s predicted values around the dependent variable’s mean. It reflects the additional variability your model explains compared to a model that contains no variables and uses only the mean to predict the dependent variable. In simpler terms, it’s the amount of variability that your model explains.
Higher RSS values indicate that your model explains more of your data’s variability.
The regression sum of squares formula is the following:
This value is the sum of the squared distances between the fitted values (ŷi) and the mean of the dependent variable (ȳ). Fitted values are your model’s predictions for each observation.
Note: Some notation uses RSS for residual SS instead of regression SS. Be aware of this potentially confusing acronym switch!
Error Sum of Squares (SSE)
Finally, we reach SSE—the portion of variability not captured by your regression model. It’s the overall variability of the distance between the data points and fitted values.
For a specific data set, smaller SSE values indicate that the observations fall closer to the fitted values. Typically, you want this number to be as low as possible because it suggests that your model’s predictions are close to the actual data values. In other words, they’re good predictions.
Ordinary least squares (OLS) regression minimizes SSE, which means you get the best possible line. That’s why statisticians named the procedure OLS!
Learn How to Find the Least Squares Line.
The error sum of squares formula is the following:
This value is the sum of the squared distances between the data points (yi) and the fitted values (ŷi). Alternatively, statisticians refer to it as the residual sum of squares because it sums the squared residuals (yi — ŷi).
Learn more in-depth about SSE, also known as the residual sum of squares.
Relationship Between the Types of SS
In the context of regression, these three types of SS serve as our map, guiding us through the variability. This partitioning reveals the proportion of the explained and unexplained variance and our model’s performance.
Understanding how each sum of squares relates to the others is straightforward:
- RSS represents the variability that your model explains. Higher is usually good.
- SSE represents the variability that your model does not explain. Smaller is generally good.
- TSS represents the variability inherent in your dependent variable.
Consequently, these three statistics have the following mathematical relationship:
RSS + SSE = TSS
Or, Explained Variability + Unexplained Variability = Total Variability
Simple math!
As you fit better models for the same dataset, RSS increases and SSE decreases by precisely the same amount. RSS cannot be greater than TSS, while SSE cannot be less than zero.
Additionally, if you take RSS / TSS (or 1 – SSE / TSS), you get the proportion of the variability of the dependent variable that your model explains. That percentage is the R-squared statistic—a vital goodness-of-fit measure for linear models!
Learn How to Interpret R-squared.
Jerry says
Thank you for this very good refresher on sums of squares. And it’s good you pointed out the possible acronym switch for some of the terms. I have had to sometimes point out how a regression line doesn’t fit the data well because of too much variability around the mean. Some people don’t realize that software will always draw a straight line with fitted OLS, even with random data, but a small R-squared and a higher SSE can make it a useless predictor.