Wilcoxon test

The Wilcoxon tests are the standard nonparametric alternatives to the \(t\)-tests. The signed-rank test replaces the paired \(t\)-test, and the rank-sum test replaces the two-sample \(t\)-test. Both use ranks instead of raw values, making them robust to non-normality and outliers while retaining more power than the sign test.

Wilcoxon signed-rank test (paired samples)

Used when the same subjects are measured twice, or when observations are matched. It tests whether the median difference between pairs equals zero.

Hypotheses: \(H_0\): the median difference is zero. \(H_1\): the median difference is not zero (or greater/less for one-sided tests).

Assumption: the differences \(d_i = X_{i,\text{after}} - X_{i,\text{before}}\) are symmetrically distributed around their median (not necessarily normal).

Procedure:

  1. Compute differences \(d_i\). Discard zeros.
  2. Rank the absolute differences \(|d_i|\) from 1 (smallest) to \(n\) (largest). Assign average ranks to ties.
  3. Attach the original sign of each difference to its rank.
  4. Compute \(W^+ = \sum \text{positive signed ranks}\) and \(W^- = \sum \text{negative signed ranks}\).
  5. The test statistic is \(W = \min(W^+, W^-)\).
  6. Compare \(W\) to the Wilcoxon signed-rank table, or compute the exact p-value via the binomial distribution.
Pain scores before and after treatment

Eight patients rate pain (0-10) before and after physiotherapy:

Patient Before After \(d_i\) \(|d_i|\) Rank Signed rank
1 7 4 -3 3 5.5 -5.5
2 5 3 -2 2 3.0 -3.0
3 8 5 -3 3 5.5 -5.5
4 6 5 -1 1 1.5 -1.5
5 9 6 -3 3 5.5 -5.5
6 4 3 -1 1 1.5 -1.5
7 7 5 -2 2 3.0 -3.0
8 6 8 +2 2 3.0 +3.0

\(W^- = 5.5+3+5.5+1.5+5.5+1.5+3 = 25.5\), \(W^+ = 3.0\). Test statistic: \(W = \min(25.5, 3.0) = 3.0\).

For \(n = 8\) and \(\alpha = 0.05\) (two-sided), the critical value is 3. Since \(W = 3 \leq 3\), reject \(H_0\).

The treatment significantly reduced pain scores (\(p \approx 0.047\)).

Example icon

Dot plot showing paired before and after pain scores connected by lines with the direction of change highlighted

Wilcoxon rank-sum test (independent samples)

Also known as the Mann-Whitney U test. Used when two independent groups are compared without assuming normality.

Hypotheses: \(H_0\): the two populations have the same distribution. \(H_1\): one population tends to have larger values.

Procedure:

  1. Combine all observations from both groups and rank them from 1 to \(n_1 + n_2\).
  2. Compute \(W_1\) = sum of ranks from Group 1.
  3. The test statistic is \(W = W_1\) (or equivalently, the Mann-Whitney \(U\) statistic).
  4. Compare to the Wilcoxon rank-sum table or compute the p-value.
Comparing recovery times: two treatments

Treatment A (\(n_1 = 5\)): recovery times 8, 12, 15, 10, 14 days. Treatment B (\(n_2 = 5\)): recovery times 18, 20, 16, 22, 19 days.

Combined and ranked (all 10 values together):

Value 8 10 12 14 15 16 18 19 20 22
Rank 1 2 3 4 5 6 7 8 9 10
Group A A A A A B B B B B

\(W_A = 1+2+3+4+5 = 15\). Expected under \(H_0\): \(n_1(n_1+n_2+1)/2 = 5 \times 11/2 = 27.5\).

\(W_A = 15\) is far below the expected 27.5, indicating Treatment A has consistently shorter recovery times. The exact p-value (two-sided) is \(p = 0.008\): reject \(H_0\).

Example icon

Stripcharts of recovery times for two treatment groups showing the difference in distributions

Running the tests in R

Both tests use wilcox.test() in base R. The paired argument distinguishes them:

# Signed-rank test (paired)
before <- c(7, 5, 8, 6, 9, 4, 7, 6)
after  <- c(4, 3, 5, 5, 6, 3, 5, 8)
wilcox.test(after, before, paired = TRUE, alternative = "two.sided")

# Rank-sum test (independent)
group_a <- c(8, 12, 15, 10, 14)
group_b <- c(18, 20, 16, 22, 19)
wilcox.test(group_a, group_b, alternative = "two.sided")

For large samples (\(n > 25\)), R uses a normal approximation. For small samples, exact p-values are computed.

⚠️ The signed-rank test assumes symmetric differences

The Wilcoxon signed-rank test assumes that the differences \(d_i\) are symmetrically distributed. If the differences are heavily skewed, the test still runs but the result may be misleading: a skewed distribution of differences means \(W^+\) and \(W^-\) are not exchangeable under \(H_0\).

For heavily skewed differences, the sign test is more appropriate since it only uses the direction, not the magnitude.

💡 Wilcoxon vs t-test: when does it matter?

The Wilcoxon tests are robust alternatives when normality is violated, but:

  • For \(n \geq 30\), the \(t\)-test is fairly robust to non-normality via the CLT. The Wilcoxon test gives similar p-values.
  • For small \(n\) with clear non-normality or outliers, Wilcoxon is the safer choice.
  • When the data are normal, the Wilcoxon test has about 95% of the power of the \(t\)-test: a very small efficiency loss for the robustness gained.