Wilcoxon test
The Wilcoxon tests are the standard nonparametric alternatives to the \(t\)-tests. The signed-rank test replaces the paired \(t\)-test, and the rank-sum test replaces the two-sample \(t\)-test. Both use ranks instead of raw values, making them robust to non-normality and outliers while retaining more power than the sign test.
Wilcoxon signed-rank test (paired samples)
Used when the same subjects are measured twice, or when observations are matched. It tests whether the median difference between pairs equals zero.
Hypotheses: \(H_0\): the median difference is zero. \(H_1\): the median difference is not zero (or greater/less for one-sided tests).
Assumption: the differences \(d_i = X_{i,\text{after}} - X_{i,\text{before}}\) are symmetrically distributed around their median (not necessarily normal).
Procedure:
- Compute differences \(d_i\). Discard zeros.
- Rank the absolute differences \(|d_i|\) from 1 (smallest) to \(n\) (largest). Assign average ranks to ties.
- Attach the original sign of each difference to its rank.
- Compute \(W^+ = \sum \text{positive signed ranks}\) and \(W^- = \sum \text{negative signed ranks}\).
- The test statistic is \(W = \min(W^+, W^-)\).
- Compare \(W\) to the Wilcoxon signed-rank table, or compute the exact p-value via the binomial distribution.
Eight patients rate pain (0-10) before and after physiotherapy:
| Patient | Before | After | \(d_i\) | \(|d_i|\) | Rank | Signed rank |
|---|---|---|---|---|---|---|
| 1 | 7 | 4 | -3 | 3 | 5.5 | -5.5 |
| 2 | 5 | 3 | -2 | 2 | 3.0 | -3.0 |
| 3 | 8 | 5 | -3 | 3 | 5.5 | -5.5 |
| 4 | 6 | 5 | -1 | 1 | 1.5 | -1.5 |
| 5 | 9 | 6 | -3 | 3 | 5.5 | -5.5 |
| 6 | 4 | 3 | -1 | 1 | 1.5 | -1.5 |
| 7 | 7 | 5 | -2 | 2 | 3.0 | -3.0 |
| 8 | 6 | 8 | +2 | 2 | 3.0 | +3.0 |
\(W^- = 5.5+3+5.5+1.5+5.5+1.5+3 = 25.5\), \(W^+ = 3.0\). Test statistic: \(W = \min(25.5, 3.0) = 3.0\).
For \(n = 8\) and \(\alpha = 0.05\) (two-sided), the critical value is 3. Since \(W = 3 \leq 3\), reject \(H_0\).
The treatment significantly reduced pain scores (\(p \approx 0.047\)).

Wilcoxon rank-sum test (independent samples)
Also known as the Mann-Whitney U test. Used when two independent groups are compared without assuming normality.
Hypotheses: \(H_0\): the two populations have the same distribution. \(H_1\): one population tends to have larger values.
Procedure:
- Combine all observations from both groups and rank them from 1 to \(n_1 + n_2\).
- Compute \(W_1\) = sum of ranks from Group 1.
- The test statistic is \(W = W_1\) (or equivalently, the Mann-Whitney \(U\) statistic).
- Compare to the Wilcoxon rank-sum table or compute the p-value.
Treatment A (\(n_1 = 5\)): recovery times 8, 12, 15, 10, 14 days. Treatment B (\(n_2 = 5\)): recovery times 18, 20, 16, 22, 19 days.
Combined and ranked (all 10 values together):
| Value | 8 | 10 | 12 | 14 | 15 | 16 | 18 | 19 | 20 | 22 |
|---|---|---|---|---|---|---|---|---|---|---|
| Rank | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| Group | A | A | A | A | A | B | B | B | B | B |
\(W_A = 1+2+3+4+5 = 15\). Expected under \(H_0\): \(n_1(n_1+n_2+1)/2 = 5 \times 11/2 = 27.5\).
\(W_A = 15\) is far below the expected 27.5, indicating Treatment A has consistently shorter recovery times. The exact p-value (two-sided) is \(p = 0.008\): reject \(H_0\).

Running the tests in R
Both tests use wilcox.test() in base R. The paired argument distinguishes them:
# Signed-rank test (paired)
before <- c(7, 5, 8, 6, 9, 4, 7, 6)
after <- c(4, 3, 5, 5, 6, 3, 5, 8)
wilcox.test(after, before, paired = TRUE, alternative = "two.sided")
# Rank-sum test (independent)
group_a <- c(8, 12, 15, 10, 14)
group_b <- c(18, 20, 16, 22, 19)
wilcox.test(group_a, group_b, alternative = "two.sided")
For large samples (\(n > 25\)), R uses a normal approximation. For small samples, exact p-values are computed.
⚠️ The signed-rank test assumes symmetric differences
The Wilcoxon signed-rank test assumes that the differences \(d_i\) are symmetrically distributed. If the differences are heavily skewed, the test still runs but the result may be misleading: a skewed distribution of differences means \(W^+\) and \(W^-\) are not exchangeable under \(H_0\).
For heavily skewed differences, the sign test is more appropriate since it only uses the direction, not the magnitude.
💡 Wilcoxon vs t-test: when does it matter?
The Wilcoxon tests are robust alternatives when normality is violated, but:
- For \(n \geq 30\), the \(t\)-test is fairly robust to non-normality via the CLT. The Wilcoxon test gives similar p-values.
- For small \(n\) with clear non-normality or outliers, Wilcoxon is the safer choice.
- When the data are normal, the Wilcoxon test has about 95% of the power of the \(t\)-test: a very small efficiency loss for the robustness gained.