• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar
  • My Store
  • Glossary
  • Home
  • About Me
  • Contact Me

Statistics By Jim

Making statistics intuitive

  • Graphs
  • Basics
  • Hypothesis Testing
  • Regression
  • ANOVA
  • Probability
  • Time Series
  • Fun
  • Calculators

Sum of Squares: Definition, Formula & Types

By Jim Frost 3 Comments

What is the Sum of Squares?

The sum of squares (SS) is a statistic that measures the variability of a dataset’s observations around the mean. It’s the cumulative total of each data point’s squared difference from the mean.

Drawing that shows low vs. high variability.
Variability measures how far observations fall from the center.

Larger values indicate a greater degree of dispersion. However, it is an unscaled measure that doesn’t adjust for the number of data points. Adding new data points to a dataset is virtually guaranteed to cause the SS to increase, and it can never decrease.

In short, simply being larger will cause a dataset to have a higher sum of squares. Consequently, you can’t compare these values between different-sized datasets. Additionally, this statistic is in squared units, further reducing interpretability.

Despite these shortcomings, SS is an invaluable tool for assessing linear models by partitioning their variance. This process breaks down the variability in our data and linear model into distinct components, helping us evaluate our model. More on that shortly!

Let’s move on to understanding the sum of squares formula and how it is the starting point for other variability measures. Then I’ll show you how it’s a fundamental component of least squares regression. Let’s dive in!

Related post: Measures of Variability

Sum of Squares Formula

The sum of squares formula is the following:

Sum of squares formula.

Where:

  • Σ represents the sum for all observations from 1 to n.
  • n is the sample size.
  • Xᵢ is an individual data point.
  • X̅ (pronounced “X-bar”) is the mean of the data points.

The sum of squares formula provides us with a measure of variability or dispersion in a data set. The process for how to find the sum of squares involves the following:

  1. Take each data point and subtract the mean from it.
  2. Square that difference.
  3. Add all the squared values to the running total.

Notice how the squaring process in the sum of squares formula ensures that it tends to increase with each additional data point. Negative differences are squared, producing a positive value that adds to the total. Only the rare observations that equal the mean exactly contribute zero to the sum.

The formula leaves the statistic in its squared form (i.e., it does not take the square root).

Finally, there is no denominator in the sum of squares formula to divide by the number of observations or degrees of freedom. That’s the unscaled nature of SS. This statistic grows with the sample size.

This sum of squares formula is the starting point for other variability measures that do factor in the sample size. Some also take the square root to use the data’s natural units. These statistics include the following:

  • Variance
  • Standard deviation
  • Mean squared error
  • Root mean squared error

SS in Regression Analysis

In regression analysis, the sum of squares (SS) is particularly helpful because it separates variability into three types: total SS, regression SS, and error SS. After explaining them individually, I’ll show you how they work together.

Total Sum of Squares (TSS)

TSS measures the total variability in your data. It’s essentially the sum of squares we discussed earlier, but linear regression applies it to your response variable.

It measures the overall variability of the dependent variable around its mean. Consider it the total amount of variation available for your model to explain.

The total sum of squares formula is the following:

TSS formula

This value is the sum of the squared distances between the observed values of the dependent variable (yi) and its mean (ȳ).

Regression Sum of Squares (RSS)

RSS measures the variability in the model’s predicted values around the dependent variable’s mean. It reflects the additional variability your model explains compared to a model that contains no variables and uses only the mean to predict the dependent variable. In simpler terms, it’s the amount of variability that your model explains.

Higher RSS values indicate that your model explains more of your data’s variability.

The regression sum of squares formula is the following:

regression sum of squares formula.

This value is the sum of the squared distances between the fitted values (ŷi) and the mean of the dependent variable (ȳ). Fitted values are your model’s predictions for each observation.

Note: Some notation uses RSS for residual SS instead of regression SS. Be aware of this potentially confusing acronym switch!

Error Sum of Squares (SSE)

Finally, we reach SSE—the portion of variability not captured by your regression model. It’s the overall variability of the distance between the data points and fitted values.

For a specific data set, smaller SSE values indicate that the observations fall closer to the fitted values. Typically, you want this number to be as low as possible because it suggests that your model’s predictions are close to the actual data values. In other words, they’re good predictions.

Ordinary least squares (OLS) regression minimizes SSE, which means you get the best possible line. That’s why statisticians named the procedure OLS!

Learn How to Find the Least Squares Line.

The error sum of squares formula is the following:

Error sum of squares formula.

This value is the sum of the squared distances between the data points (yi) and the fitted values (ŷi). Alternatively, statisticians refer to it as the residual sum of squares because it sums the squared residuals (yi — ŷi).

Learn more in-depth about SSE, also known as the residual sum of squares.

Relationship Between the Types of SS

In the context of regression, these three types of SS serve as our map, guiding us through the variability. This partitioning reveals the proportion of the explained and unexplained variance and our model’s performance.

Understanding how each sum of squares relates to the others is straightforward:

  • RSS represents the variability that your model explains. Higher is usually good.
  • SSE represents the variability that your model does not explain. Smaller is generally good.
  • TSS represents the variability inherent in your dependent variable.

Consequently, these three statistics have the following mathematical relationship:

RSS + SSE = TSS

Or, Explained Variability + Unexplained Variability = Total Variability

Simple math!

As you fit better models for the same dataset, RSS increases and SSE decreases by precisely the same amount. RSS cannot be greater than TSS, while SSE cannot be less than zero.

Additionally, if you take RSS / TSS (or 1 – SSE / TSS), you get the proportion of the variability of the dependent variable that your model explains. That percentage is the R-squared statistic—a vital goodness-of-fit measure for linear models!

Learn How to Interpret R-squared.

Share this:

  • Tweet

Like this:

Like Loading...

Related

Filed Under: Regression Tagged With: conceptual, formula

Reader Interactions

Comments

  1. Patrick D says

    January 12, 2025 at 10:58 am

    Very nice overview. I would like to add one small caveat:
    While the OLS-Regression line does provide the best fit for the data, its regression coefficient underestimates the true relationship between y and x in cases where there is measurement error in the independent variable. In such cases, a “deming” or “errors-in-variables” regression might offer more accurate estimates.

    Loading...
    Reply
    • Jim Frost says

      January 13, 2025 at 2:08 pm

      Hi Patrick,

      You’re right on with your comment! In fact, there are multiple assumptions that you must satisfy before knowing that OLS provides the best fits. Whenever one of these assumptions are not satisfied, you typically need to use a different type of regression. As you mention, using a Deming regression is for one such assumption violation. For the other assumptions, read my post about OLS Assumptions.

      For more information about OLS providing the best or BLUE estimates (i.e., BLUE, best linear unbiased estimates), read my post about the Gauss-Markov Theorem.

      Loading...
      Reply
  2. Jerry says

    May 31, 2023 at 11:57 am

    Thank you for this very good refresher on sums of squares. And it’s good you pointed out the possible acronym switch for some of the terms. I have had to sometimes point out how a regression line doesn’t fit the data well because of too much variability around the mean. Some people don’t realize that software will always draw a straight line with fitted OLS, even with random data, but a small R-squared and a higher SSE can make it a useless predictor.

    Loading...
    Reply

Comments and QuestionsCancel reply

Primary Sidebar

Meet Jim

Iโ€™ll help you intuitively understand statistics by focusing on concepts and using plain English so you can concentrate on understanding your results.

Read More...

Buy My Introduction to Statistics Book!

Cover of my Introduction to Statistics: An Intuitive Guide ebook.

Buy My Hypothesis Testing Book!

Cover image of my Hypothesis Testing: An Intuitive Guide ebook.

Buy My Regression Book!

Cover for my ebook, Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models.

Subscribe by Email

Enter your email address to receive notifications of new posts by email.

    I won't send you spam. Unsubscribe at any time.

    Buy My Thinking Analytically Book!

    Cover for my book, Thinking Analytically: An Guide for Making Data-Driven Decisions.

    Top Posts

    • F-table
    • Cronbachโ€™s Alpha: Definition, Calculations & Example
    • Multicollinearity in Regression Analysis: Problems, Detection, and Solutions
    • Z-table
    • Principal Component Analysis Guide & Example
    • How To Interpret R-squared in Regression Analysis
    • Root Mean Square Error (RMSE)
    • Interpreting Correlation Coefficients
    • How to Interpret P-values and Coefficients in Regression Analysis
    • Cohens D: Definition, Using & Examples

    Recent Posts

    • Normal Distribution Calculator
    • Linear Regression Calculator
    • Negative Binomial Distribution Calculator
    • Geometric Distribution Calculator
    • Hypergeometric Distribution Calculator
    • Weighted Average Calculator

    Recent Comments

    • John Hoenig on Range Rule of Thumb: Overview and Formula
    • Bob on Cronbachโ€™s Alpha: Definition, Calculations & Example
    • Mike Szymczuk on Factor Analysis Guide with an Example
    • Kate on Multinomial Logistic Regression: Overview & Example
    • Jim Frost on Using Confidence Intervals to Compare Means

    Copyright © 2025 · Jim Frost · Privacy Policy

    %d