HOME

Wilcoxon test

The Wilcoxon tests are the standard nonparametric alternatives to the \(t\)-tests. The signed-rank test replaces the paired \(t\)-test, and the rank-sum test replaces the two-sample \(t\)-test. Both use ranks instead of raw values, making them robust to non-normality and outliers while retaining more power than the sign test.

Wilcoxon signed-rank test (paired samples)

Used when the same subjects are measured twice, or when observations are matched. It tests whether the median difference between pairs equals zero.

Hypotheses: \(H_0\): the median difference is zero. \(H_1\): the median difference is not zero (or greater/less for one-sided tests).

Assumption: the differences \(d_i = X_{i,\text{after}} - X_{i,\text{before}}\) are symmetrically distributed around their median (not necessarily normal).

Procedure:

Compute differences \(d_i\). Discard zeros.
Rank the absolute differences \(|d_i|\) from 1 (smallest) to \(n\) (largest). Assign average ranks to ties.
Attach the original sign of each difference to its rank.
Compute \(W^+ = \sum \text{positive signed ranks}\) and \(W^- = \sum \text{negative signed ranks}\).
The test statistic is \(W = \min(W^+, W^-)\).
Compare \(W\) to the Wilcoxon signed-rank table, or compute the exact p-value via the binomial distribution.

Pain scores before and after treatment

Eight patients rate pain (0-10) before and after physiotherapy:

Patient	Before	After	\(d_i\)	\(\|d_i\|\)	Rank	Signed rank
1	7	4	-3	3	5.5	-5.5
2	5	3	-2	2	3.0	-3.0
3	8	5	-3	3	5.5	-5.5
4	6	5	-1	1	1.5	-1.5
5	9	6	-3	3	5.5	-5.5
6	4	3	-1	1	1.5	-1.5
7	7	5	-2	2	3.0	-3.0
8	6	8	+2	2	3.0	+3.0

\(W^- = 5.5+3+5.5+1.5+5.5+1.5+3 = 25.5\), \(W^+ = 3.0\). Test statistic: \(W = \min(25.5, 3.0) = 3.0\).

For \(n = 8\) and \(\alpha = 0.05\) (two-sided), the critical value is 3. Since \(W = 3 \leq 3\), reject \(H_0\).

The treatment significantly reduced pain scores (\(p \approx 0.047\)).

Dot plot showing paired before and after pain scores connected by lines with the direction of change highlighted

Wilcoxon rank-sum test (independent samples)

Also known as the Mann-Whitney U test. Used when two independent groups are compared without assuming normality.

Hypotheses: \(H_0\): the two populations have the same distribution. \(H_1\): one population tends to have larger values.

Procedure:

Combine all observations from both groups and rank them from 1 to \(n_1 + n_2\).
Compute \(W_1\) = sum of ranks from Group 1.
The test statistic is \(W = W_1\) (or equivalently, the Mann-Whitney \(U\) statistic).
Compare to the Wilcoxon rank-sum table or compute the p-value.

Comparing recovery times: two treatments

Treatment A (\(n_1 = 5\)): recovery times 8, 12, 15, 10, 14 days. Treatment B (\(n_2 = 5\)): recovery times 18, 20, 16, 22, 19 days.

Combined and ranked (all 10 values together):

Value	8	10	12	14	15	16	18	19	20	22
Rank	1	2	3	4	5	6	7	8	9	10
Group	A	A	A	A	A	B	B	B	B	B

\(W_A = 1+2+3+4+5 = 15\). Expected under \(H_0\): \(n_1(n_1+n_2+1)/2 = 5 \times 11/2 = 27.5\).

\(W_A = 15\) is far below the expected 27.5, indicating Treatment A has consistently shorter recovery times. The exact p-value (two-sided) is \(p = 0.008\): reject \(H_0\).

Stripcharts of recovery times for two treatment groups showing the difference in distributions

Running the tests in R

Both tests use wilcox.test() in base R. The paired argument distinguishes them:

# Signed-rank test (paired)
before <- c(7, 5, 8, 6, 9, 4, 7, 6)
after  <- c(4, 3, 5, 5, 6, 3, 5, 8)
wilcox.test(after, before, paired = TRUE, alternative = "two.sided")

# Rank-sum test (independent)
group_a <- c(8, 12, 15, 10, 14)
group_b <- c(18, 20, 16, 22, 19)
wilcox.test(group_a, group_b, alternative = "two.sided")

For large samples (\(n > 25\)), R uses a normal approximation. For small samples, exact p-values are computed.

⚠️ The signed-rank test assumes symmetric differences

The Wilcoxon signed-rank test assumes that the differences \(d_i\) are symmetrically distributed. If the differences are heavily skewed, the test still runs but the result may be misleading: a skewed distribution of differences means \(W^+\) and \(W^-\) are not exchangeable under \(H_0\).

For heavily skewed differences, the sign test is more appropriate since it only uses the direction, not the magnitude.

💡 Wilcoxon vs t-test: when does it matter?

The Wilcoxon tests are robust alternatives when normality is violated, but:

For \(n \geq 30\), the \(t\)-test is fairly robust to non-normality via the CLT. The Wilcoxon test gives similar p-values.
For small \(n\) with clear non-normality or outliers, Wilcoxon is the safer choice.
When the data are normal, the Wilcoxon test has about 95% of the power of the \(t\)-test: a very small efficiency loss for the robustness gained.