What is a Parsimonious Model?
A parsimonious model in statistics is one that uses relatively few independent variables to obtain a good fit to the data.
Analysts often think that intricate problems require complex regression models. However, studies reveal that simpler models tend to be more precise. When evaluating several models with similar explanatory power, choose the simpler one.
Parsimonious models seek simplicity. This concept aligns with Occam’s razor, the guideline that, given two explanations for something, the simplest one is usually right. When applied to statistical models, the principle of parsimony aims to explain data with the fewest possible parameters, providing vital benefits.
In this blog, I explore the benefits of parsimonious models and guide you through selecting one using statistical metrics like adjusted R-squared and Mallows’ Cp.
Parsimonious Model Benefits
As you add independent variables in regression analysis, the model invariably fits the data better. Consequently, there’s a tradeoff between parsimony and goodness-of-fit. Add more variables, and the fit improves, but parsimony decreases. Remove variables from the model and parsimony increases, but the fit worsens.
If you can get a better fit by adding variables, why strive for a parsimonious model? Why not add all your variables and call it a day?
Parsimonious models provide several vital benefits, including improved generalizability, increased precision, and easier interpretation.
Learn more about Independent and Dependent Variables: Differences & Examples.
Better Generalizability
When you add more variables, the fit improves. However, with too much complexity, the model tends to fit the random quirks of your sample. Statisticians refer to this problem as overfitting the model.
How is it a problem?
As you make a model more complex, you are more likely tailoring it to fit the quirks in your particular dataset rather than actual relationships in the population. This overfitting reduces generalizability and can produce results you can’t trust for the following reasons.
- Coefficient Estimates: Overfit models will precisely describe the relationships and random error in your specific sample but won’t reflect your target population.
- Predictions: Your model will precisely predict the data points in your sample, but the predictions will perform poorly with new data.
Complex models risk fitting the noise in the data, mistaking it for a real pattern. Parsimonious models curb this risk by limiting the number of parameters, mitigating overfitting, and tend to generalize better to unseen data.
In short, a parsimonious model tends to generalize to the population and predict new data points better.
Learn more about Overfitting Models: Problems, Detection & Avoidance.
Enhanced Precision through Reduced Variance
Incorporating more variables into a model can increase its variance even when you’re not overfitting the model. Parsimonious models counter this by favoring simplicity.
A parsimonious model tends to have reduced variance, enabling more precise coefficient estimates and predictions.
Easier Interpretation
A parsimonious model is easier to interpret because there are fewer parameters to consider.
A simpler model makes it easier to see how independent variables influence the dependent variable, fostering clearer insights.
How to Choose a Parsimonious Model
There is no rule of thumb for how many variables a parsimonious model contains. Instead, the number varies on a case-by-case basis. Furthermore, removing too many variables can bias your model, something you must avoid. Consequently, finding a good compromise between parsimony and goodness-of-fit that works for your specific dataset and model is crucial.
Specifying a good model is both an art and a science that is a complicated topic. To learn more about that process, read my post, Specifying the Best Model. Here, I’ll focus on several statistics to help you choose a parsimonious model.
The goal is to find a model with few variables that fit the data nearly as well as a more complex model. When selecting a parsimonious model, we aim for simplicity, but not by losing excessive explanatory power.
Below are two statistics that can help you choose a parsimonious model that still captures the underlying patterns in the data. Both the adjusted R-squared and Mallows’ Cp evaluate model fit while factoring in the number of variables in the model.
Adjusted R-squared
The adjusted R-squared modifies the regular R-squared by accounting for the number of predictors in the model. Unlike the regular R-squared, which can only increase as you add more variables, the adjusted R-squared can decrease if the new variables don’t improve the model sufficiently.
When comparing models, opt for the one with the higher adjusted R-squared value, as it signifies a better fit with fewer variables.
You can also use predicted R-squared. To learn more, read my post about adjusted and predicted R-squared.
Mallows’ Cp
Mallows’ Cp is a criterion used for model selection that aims to find a balance between goodness-of-fit and model complexity. Specifically, this statistic identifies smaller models with lower variance than those with all the variables. This reduction in variability translates to more precise coefficient estimates and predictions.
For a parsimonious model, select the model with a Cp value close to the number of predictors plus the intercept, ensuring simplicity while maintaining explanatory power. For example, a Mallows’ Cp near 4 fits the bill for a model with 3 independent variables and the constant.
While I won’t cover it here in detail, the Bayesian Information Criterion (BIC) can also help you identify a parsimonious model because it considers the number of parameters. This statistic tends to favor simpler models. Find the BIC for your candidate models and then choose the one with the lowest BIC.
Parsimonious Model Example
Let’s look at example results to see how to use these statistics in action. Imagine we have many variables but want to find a parsimonious model. The output below shows five independent variables for brevity, but imagine we have more.
We’re looking for a model with a high adjusted R-squared and a Mallows’ Cp close to the number of variables plus one. The circled model satisfies these criteria.
This one might be our parsimonious model! It uses only four of the many variables we’re considering. Of course, we should ensure it is theoretically sound and check the residual plots.
Practical Steps for Finding a Parsimonious Model
Here are some tips for identifying the parsimonious model that generalizes well and is reasonably precise and unbiased.
- Start simple and add complexity only when it produces practically important goodness-of-fit improvements that stand on strong theoretical grounds.
- When comparing models, assess adjusted R-squared and Mallows’ Cp. Choose the one that balances simplicity and explanatory power.
- Don’t mindlessly chase a high R-squared. It might look good in your results but leads you away from the parsimonious model because it favors overfitting models with low precision and generalizability.
- While finding a parsimonious model is important, you don’t want to go overboard because oversimplicity can produce biased models. Check your residual plots to assess bias. I advise that the simplest model with good-looking residual plots is often a great candidate for the parsimonious model. Learn how to read residual plots.
By employing metrics like adjusted R-squared and Mallows’ Cp, statisticians can sift through the cacophony of complexity to find the symphony of simplicity, selecting concise and explanatory models.
Comments and Questions