What is Partial Least Squares?
Partial Least Squares (PLS) is a statistical regression method that models complex relationships between sets of independent variables (predictors) and one or more dependent variables (responses). It is especially useful when the predictors are highly collinear, when the number of predictors exceeds the number of observations, or when the data are noisy.
Analysts use PLS widely in chemometrics, bioinformatics, finance, and other fields where high-dimensional data are common. It is suitable for regression, classification, and predictive modeling.
How it Works
Partial least squares regression works by extracting latent components that are linear combinations of the predictors.
- Unlike principal component analysis, whose components are chosen only to capture as much variance in X as possible, PLS chooses each component to maximize its covariance with the response Y. So, PLS expressly selects components for predictive power.
- Unlike ordinary multiple linear regression, PLS projects the possibly collinear predictors onto a small set of orthogonal latent variables. This process makes coefficient estimates much more stable when multicollinearity is present.
Partial least squares is a flexible and powerful technique, particularly well-suited for high-dimensional data with multicollinearity or small sample sizes. It can handle these situations where the number of predictors exceeds the number of observations and still produce stable models. PLS often outperforms traditional regression methods in prediction tasks for these challenging cases because it intentionally extracts uncorrelated components that explain variation in both the predictors and the outcome.
However, a key limitation is interpretability. PLS fits a regression model to the new components, not to the original independent variables. Because these components are weighted combinations of the original predictors, determining the specific contribution of any one variable to the outcome is difficult. This lack of transparency makes PLS less ideal for studies that focus on explanation rather than prediction. While the method excels at summarizing and using the information in high-dimensional data, it obscures the role of individual variables.
In short, partial least squares is better at prediction than explanation.
PLS Example
For example, in a chemical spectroscopy study, researchers might use partial least squares to predict the concentration of a substance based on hundreds of correlated spectral measurements. PLS reduces the dimensionality of the data while preserving the variation that is most useful for predicting the target outcome.
« Back to Glossary Index