• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar
  • My Store
  • Glossary
  • Home
  • About Me
  • Contact Me

Statistics By Jim

Making statistics intuitive

  • Graphs
  • Basics
  • Hypothesis Testing
  • Regression
  • ANOVA
  • Probability
  • Time Series
  • Fun

Orthogonal: Models, Definition & Finding

By Jim Frost 4 Comments

Orthogonality is a mathematical property that is beneficial for statistical models. It’s particularly helpful when performing factorial analysis of designed experiments.

Orthogonality has various mathematic and geometric definitions. In this post, I’ll define it mathematically and then explain its practical benefits for statistical models.

Terminology

Pretty image of equations.First, here’s a bit of background terminology that you’ll encounter when discussing orthogonal matrices.

In math, a matrix is a two-dimensional rectangular array of numbers with columns and rows. A vector is simply a matrix that has either one row or one column.

For a regression model, the columns in your dataset are the independent and dependent variables. These columns are vectors.

When I refer to a vector in this context, you can think of a datasheet column representing a variable. Orthogonality applies specifically to the independent variables.

Related post: Independent and Dependent Variables

Orthogonal Definition

Vectors are orthogonal when the products of their matching elements sum to zero. That’s a mouthful, but it’s pretty simple illustrating how to find orthogonal vectors.

Follow these steps to calculate the sum of the vectors’ products.

  1. Multiply the first values of each vector.
  2. Multiply the second values, and repeat for all values in the vectors.
  3. Sum those products.

If the sum equals zero, the vectors are orthogonal.

Let’s work through an example. Below are two vectors, V1 and V2. Each vector has five values.

Image of the first vector.

Image of the second vector.

The table below multiplies the values in each vector and sums them.

Table calculations showing the orthogonality of the vectors.

Because the sum equals zero, the vectors are orthogonal.

For the discussion about orthogonality in linear models below, consider each vector to be an independent variable.

Orthogonality in Regression and ANOVA models

Orthogonality provides essential benefits to linear models, even though that might not be obvious from the mathematic definition!

When independent variables are orthogonal, they are uncorrelated, which is beneficial. Statisticians refer to the correlation amongst independent variables as multicollinearity. A little bit is okay, but more can cause problems.

The best case is when there is no multicollinearity at all, which is an orthogonal model. Orthogonality indicates that the independent variables are genuinely independent. They are not associated at all—totally uncorrelated.

For orthogonal models, the coefficient estimates for the reduced model will be the same as those in the full model. In other words, you obtain the same estimated effects for the independent variables whether you test them individually or simultaneously. You can add or subtract the orthogonal variables without affecting the coefficients of the other variables. The same is true for including or excluding interaction effects.

Your interpretation is easier, and you’ll feel more confident about your results because the coefficients won’t change as you alter the model.

Alternatively, when the variables are not orthogonal, the coefficients can change when you adjust the variables in the model. The effects depend on the variables in the model to some degree. This condition can leave you feeling less sure about the correct effects!

Related post: Multicollinearity: Problems, Detection, and Solutions

Orthogonal Designs in Factorial Experiments

It might sound unlikely that there would be absolutely no correlation between independent variables. That the sum of the vectors’ products will equal zero exactly. And you’d be correct. There’s usually some correlation, even if just by chance. However, when you use statistical software to design an experiment, it uses an algorithm to create an orthogonal factorial design to meet your needs.

Factorial designs set up a variety of contrasts to see how they affect the outcome. For example, in a manufacturing process, researchers might include time, temperature, alloy, etc., as factors in an experiment about increasing the strength of their product. Typically, each factor has two levels, and the analysis compares the mean outcomes between them.

Factorial designs are special cases of ANOVA. Again, your statistical software uses an algorithm to set up factorial designs that are orthogonal. It’ll devise combinations of factor level settings for each experimental run that collectively produce orthogonality. This process ensures that each factor’s effect is estimated independently from the other factors.

To learn about factors and ANOVA, read my ANOVA Overview.

In the factorial design below, we have three factors, A, B, and C. Each factor has two settings. In the datasheet, a 1 represents one setting for a factor, while -1 is the other. In a real experiment, the analysts enter the observed outcomes in their own column.

Datasheet displays an orthogonal factorial designed experiment.

By calculating the sums of the products for the factors, we can see this is an orthogonal design.

A designed experiment is orthogonal when the effects of the factors sum to zero across the other factors, allowing the analysis to estimate each one independently.

Share this:

  • Tweet

Related

Filed Under: Regression Tagged With: conceptual

Reader Interactions

Comments

  1. Luã says

    September 27, 2021 at 1:39 am

    Great, explanation! is there a threshold for how much different from zero the products are allowed in order to don’t affect the analysis?

    Reply
    • Jim Frost says

      September 28, 2021 at 12:13 am

      Hi Luã,

      While perfect orthogonality is the best, there are thresholds for when that’s not possible. However, they’re discussed in terms of multicollinearity, which is the correlation amongst predictors. For more details about the thresholds and how to assess that, read my post, Multicollinearity: Problems, Detection, and Solutions.

      Reply
  2. Funsho Olukade says

    September 26, 2021 at 10:51 pm

    Thanks Jim for always finding a simple way and language to explain complex statistical concepts

    Reply
  3. Jefferson says

    September 24, 2021 at 10:09 pm

    Nice read! Thank you, Jim.

    Reply

Comments and Questions Cancel reply

Primary Sidebar

Meet Jim

I’ll help you intuitively understand statistics by focusing on concepts and using plain English so you can concentrate on understanding your results.

Read More...

Buy My Introduction to Statistics Book!

Cover of my Introduction to Statistics: An Intuitive Guide ebook.

Buy My Hypothesis Testing Book!

Cover image of my Hypothesis Testing: An Intuitive Guide ebook.

Buy My Regression Book!

Cover for my ebook, Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models.

Subscribe by Email

Enter your email address to receive notifications of new posts by email.

    I won't send you spam. Unsubscribe at any time.

    Follow Me

    • FacebookFacebook
    • RSS FeedRSS Feed
    • TwitterTwitter

    Top Posts

    • How to Interpret P-values and Coefficients in Regression Analysis
    • How To Interpret R-squared in Regression Analysis
    • Mean, Median, and Mode: Measures of Central Tendency
    • Multicollinearity in Regression Analysis: Problems, Detection, and Solutions
    • How to Interpret the F-test of Overall Significance in Regression Analysis
    • Choosing the Correct Type of Regression Analysis
    • How to Find the P value: Process and Calculations
    • Interpreting Correlation Coefficients
    • How to do t-Tests in Excel
    • Z-table

    Recent Posts

    • Fishers Exact Test: Using & Interpreting
    • Percent Change: Formula and Calculation Steps
    • X and Y Axis in Graphs
    • Simpsons Paradox Explained
    • Covariates: Definition & Uses
    • Weighted Average: Formula & Calculation Examples

    Recent Comments

    • Dave on Control Variables: Definition, Uses & Examples
    • Jim Frost on How High Does R-squared Need to Be?
    • Mark Solomons on How High Does R-squared Need to Be?
    • John Grenci on Normal Distribution in Statistics
    • Jim Frost on Normal Distribution in Statistics

    Copyright © 2023 · Jim Frost · Privacy Policy