Mahalanobis distance is a multivariate distance metric that measures how far a point is from the center of a distribution, taking into account correlations between variables. Unlike Euclidean distance, it adjusts for the scale and relationships among variables, making it useful for identifying outliers or clustering observations in multivariate data.
For example, in a dataset with height and weight, Mahalanobis distance can identify individuals whose combined height and weight deviate unusually from the typical population pattern, even if neither variable alone seems extreme.
In the graph below, the Input and Output values for the circled point are both within normal parameters. However, this point falls outside the typical pattern and will have a large Mahalanobis distance value. While these outliers are easy to see on a graph with two variables, the Mahalanobis distance is great for cases where you have more than two variables and can’t graph them.
