Kolmogorov-Smirnov (Lilliefors) test

The Kolmogorov-Smirnov test measures the largest vertical distance between the empirical cumulative distribution function of a sample and a reference CDF. The Lilliefors variant adjusts the critical values for the case where the distribution parameters are estimated from the data, which is always the case when testing for normality.

KS test vs Lilliefors test

The two tests share the same statistic but differ in how the reference distribution is specified:

  • KS test: the reference distribution is fully specified (known parameters). For example, testing whether data follow a \(N(10, 4)\) with \(\mu=10\) and \(\sigma=2\) fixed in advance.
  • Lilliefors test: the parameters are estimated from the data (\(\hat{\mu} = \bar{x}\), \(\hat{\sigma} = S\)). This is what you do when testing for normality without knowing \(\mu\) and \(\sigma\). Estimating parameters from the data makes the test statistic stochastically smaller, so the KS critical values are too conservative: Lilliefors recomputed them via simulation.

In practice, when you say “KS test for normality,” you almost always mean the Lilliefors test.

Test statistic

\[D = \max_x \left| F_n(x) - F_0(x) \right|\]

where \(F_n(x)\) is the empirical CDF of the sample and \(F_0(x)\) is the reference CDF (normal with estimated parameters in the Lilliefors case). The larger \(D\) is, the more the sample deviates from the reference distribution.

Hypotheses: \(H_0\): the data follow the specified distribution. \(H_1\): they do not.

The p-value is computed by comparing \(D\) to the Lilliefors distribution (via simulation or dedicated tables), not the standard KS distribution.

Empirical CDF vs theoretical normal CDF showing the D statistic as the maximum vertical distance between the two curves

The red step function is the empirical CDF; the blue curve is the theoretical normal CDF with estimated parameters. The green segment marks the maximum distance \(D\).

Step-by-step example

A lab measures the viscosity of 30 samples. Before applying a \(t\)-test, the analyst checks for normality. The data give \(\bar{x} = 12.4\) cP, \(S = 1.8\) cP.

ECDF vs normal CDF for the viscosity data with D statistic marked and p-value from Lilliefors test

If \(p > 0.05\), normality is not rejected and the \(t\)-test can proceed. In R:

library(nortest)
lillie.test(x)   # Lilliefors test
ks.test(x, "pnorm", mean(x), sd(x))  # KS test (wrong for normality: use lillie.test)

Comparison with other normality tests

The KS/Lilliefors test is one of several options for testing normality. Each has different strengths:

Test Best for Power Notes
Lilliefors General normality Moderate Parameters estimated from data
Shapiro-Wilk Small to medium samples (\(n \leq 50\)) High Most powerful for normality
Anderson-Darling Tail sensitivity High Weights tails more than center
KS (original) Known parameters Moderate Too conservative when parameters estimated

⚠️ Shapiro-Wilk is almost always preferred over Lilliefors for normality testing

For testing normality, Shapiro-Wilk has higher power than the Lilliefors test across almost all sample sizes and departures from normality. Unless you have a specific reason to use Lilliefors (e.g., software constraints), use Shapiro-Wilk in R: shapiro.test(x).

The original KS test (with known parameters) should never be used for normality testing when the mean and variance are estimated from the data: the p-values will be too large (too conservative) because the KS critical values assume fully specified distributions.

⚠️ All normality tests have low power for small samples and high power for large ones

For \(n < 20\), normality tests rarely reject \(H_0\) even for clearly non-normal data: the test has low power. For \(n > 100\), they often reject \(H_0\) for trivial departures from normality that have no practical consequence. A Q-Q plot is always a useful complement to the formal test: it shows where and how the data depart from normality, which the p-value alone does not reveal.

💡 Practical workflow for normality checking

A robust approach to normality checking before parametric tests:

  1. Plot a Q-Q plot: qqnorm(x); qqline(x).
  2. Run Shapiro-Wilk: shapiro.test(x).
  3. If \(n > 50\), complement with Anderson-Darling: nortest::ad.test(x).
  4. Consider the robustness of the downstream test: the \(t\)-test is fairly robust to mild non-normality for \(n \geq 30\), so a minor departure from normality may not matter.

Reject the parametric test only when the Q-Q plot shows a clear systematic departure and the normality test confirms it.