What is the Kolmogorov-Smirnov Test?
The Kolmogorov-Smirnov test (K-S test) is a nonparametric statistical test that compares distributions.
There are two forms of the test. The one-sample K-S test assesses whether a sample was drawn from a parent population that follows a specified theoretical distribution, such as the normal or exponential. The two-sample K-S test evaluates whether two independent samples were drawn from the same parent population. Because the test makes no assumptions about the shape of the distributions, it is especially useful in situations where parametric tests might not be appropriate.
The two-sample Kolmogorov-Smirnov test is one of the most useful and general nonparametric methods for comparing two independent samples. It is sensitive to differences in both the location and shape of the distributions, making it more informative than tests that focus only on means or variances.
The one-sample version of the K-S test can test for normality or compare a sample to any other reference distribution (such as exponential or uniform). If the sample significantly deviates from the specified distribution, the test produces a small p-value.
The Kolmogorov-Smirnov test works by comparing empirical cumulative distribution functions (ECDFs). In the one-sample K-S test, it compares the ECDF of the sample to the ECDF of a specified reference distribution (such as normal or uniform). For the two-sample version, it compares the ECDFs of the two samples directly. In both cases, the test calculates the maximum vertical distance between the ECDFs being compared. This distance reflects how much the distributions differ in terms of shape and location.
A small p-value from a K-S test suggests that the distributions differ significantly. However, the test is more reliable when applied to continuous data and has limited power for small samples or when the distributions differ only slightly.
Alternative Parametric Distribution Tests
While there is no direct parametric equivalent to the Kolmogorov-Smirnov test, some parametric tests serve similar purposes under stronger assumptions and for more limited purposes.
For one-sample comparisons, the Anderson–Darling test can assess whether a sample comes from a specified reference distribution, including normal, exponential, and others, and gives more weight to the tails than the K-S test. The Shapiro-Wilk test is another option, though it is limited to testing for normality.
For two-sample comparisons, parametric alternatives like the t-test (for comparing means) and the F-test (for comparing variances) focus on specific distribution parameters rather than the overall shape. These tests are often more powerful when their assumptions hold, but the Kolmogorov-Smirnov test remains more flexible because it compares the entire distributions without assuming a specific form.
Kolmogorov-Smirnov Test Examples
Suppose a researcher wants to compare the distribution of customer purchase amounts before and after a promotional campaign. Instead of assuming the distributions are normal, they use a two-sample Kolmogorov-Smirnov test to compare the ECDFs. A significant result would suggest that the campaign altered the shape or center of the spending distribution.
In another case, a data analyst might use a one-sample K-S test to determine whether reaction times from an experiment follow a normal distribution. If the test produces a significant result, it suggests that the data deviate meaningfully from normality.
« Back to Glossary Index