Bootstrap confidence intervals

Bootstrap confidence intervals use the bootstrap distribution of \(\hat{\theta}^*\) to construct a range of plausible values for the true parameter \(\theta\). Four main methods exist, differing in accuracy, assumptions, and computational cost. The BCa interval is the most accurate general-purpose choice.

Setup and notation

From \(B\) bootstrap resamples we have estimates \(\hat{\theta}^*_1, \ldots, \hat{\theta}^*_B\). Let \(\hat{\theta}^*_{(\alpha)}\) denote the \(\alpha\)-quantile of the bootstrap distribution. The observed statistic from the original sample is \(\hat{\theta}\).

  1. Percentile interval

The simplest method: use the quantiles of the bootstrap distribution directly.

\[\text{CI}_\text{perc} = \left[\hat{\theta}^*_{(\alpha/2)},\; \hat{\theta}^*_{(1-\alpha/2)}\right]\]

Intuition: if the bootstrap distribution is centered at \(\hat{\theta}\) and symmetric, the percentile interval is a valid approximation. Implementation: sort the \(B\) bootstrap estimates and take the \(\lfloor B\alpha/2 \rfloor\) and \(\lceil B(1-\alpha/2) \rceil\) values.

Limitation: the percentile interval ignores bias. If the bootstrap distribution is shifted relative to the true sampling distribution (i.e., \(E[\hat{\theta}^*] \neq \hat{\theta}\) or \(E[\hat{\theta}] \neq \theta\)), the interval has incorrect coverage.

  1. Basic (reversed) interval

Also called the empirical or pivotal interval. It uses the bootstrap distribution of \(\hat{\theta}^* - \hat{\theta}\) as an approximation for the sampling distribution of \(\hat{\theta} - \theta\):

\[\text{CI}_\text{basic} = \left[2\hat{\theta} - \hat{\theta}^*_{(1-\alpha/2)},\; 2\hat{\theta} - \hat{\theta}^*_{(\alpha/2)}\right]\]

Intuition: reflect the bootstrap distribution around \(\hat{\theta}\). This corrects for bias in the location of the bootstrap distribution but not for skewness. The bounds are “reversed”: the upper bound uses the lower bootstrap quantile and vice versa.

  1. Studentized (bootstrap-t) interval

The most accurate interval when a standard error estimate is available. It standardizes the bootstrap statistic by its own SE:

\[t_b^* = \frac{\hat{\theta}^*_b - \hat{\theta}}{\widehat{\text{SE}}^*_b}\]

\[\text{CI}_\text{stud} = \left[\hat{\theta} - t^*_{(1-\alpha/2)} \cdot \widehat{\text{SE}},\; \hat{\theta} - t^*_{(\alpha/2)} \cdot \widehat{\text{SE}}\right]\]

where \(\widehat{\text{SE}}^*_b\) is the SE of \(\hat{\theta}^*_b\) estimated from a nested bootstrap within each resample (or analytically if available), and \(\widehat{\text{SE}}\) is the SE from the original sample.

Advantage: accounts for varying SE across bootstrap resamples, achieving second-order accuracy. Disadvantage: requires \(B \times B\) nested bootstrap or an analytical SE formula, making it computationally expensive.

  1. BCa interval (bias-corrected and accelerated)

The standard recommendation for general use. It adjusts the percentile interval for both bias and skewness using two quantities:

  • Bias-correction \(\hat{z}_0\): measures how far the bootstrap distribution is shifted from \(\hat{\theta}\).
  • Acceleration \(\hat{a}\): measures how the SE of \(\hat{\theta}\) changes with the true parameter value (estimated via jackknife).

\[\hat{z}_0 = \Phi^{-1}\!\left(\frac{\#\{\hat{\theta}^*_b < \hat{\theta}\}}{B}\right)\]

\[\hat{a} = \frac{\sum_{i=1}^n (\bar{\psi} - \psi_i)^3}{6\left[\sum_{i=1}^n (\bar{\psi} - \psi_i)^2\right]^{3/2}}, \quad \psi_i = \hat{\theta}_{(i)} \text{ (jackknife)}\]

\[\alpha_1 = \Phi\!\left(\hat{z}_0 + \frac{\hat{z}_0 + z_{\alpha/2}}{1 - \hat{a}(\hat{z}_0 + z_{\alpha/2})}\right), \quad \alpha_2 = \Phi\!\left(\hat{z}_0 + \frac{\hat{z}_0 + z_{1-\alpha/2}}{1 - \hat{a}(\hat{z}_0 + z_{1-\alpha/2})}\right)\]

\[\text{CI}_\text{BCa} = \left[\hat{\theta}^*_{(\alpha_1)},\; \hat{\theta}^*_{(\alpha_2)}\right]\]

When \(\hat{z}_0 = 0\) (no bias) and \(\hat{a} = 0\) (no skewness), BCa reduces to the percentile interval. BCa achieves second-order accuracy: its coverage error is \(O(n^{-1})\) vs \(O(n^{-1/2})\) for the percentile interval.

Comparison of the four methods

Four bootstrap confidence intervals compared for the sample correlation coefficient showing differences in width and location

The four intervals differ in location and width because they make different corrections for bias and skewness. For a near-boundary correlation (\(\hat{\rho} \approx 0.8\)), the distribution is left-skewed and BCa produces the most appropriate asymmetric interval.

Bootstrap distribution of the correlation with the four CI bounds marked

Which method to use

Method Accuracy Requires Best for
Percentile First-order \(B \geq 1000\) Quick estimates, symmetric distributions
Basic First-order \(B \geq 1000\) Corrects location bias
Studentized Second-order SE per resample Pivot-able statistics with available SE
BCa Second-order \(B \geq 5000\) + jackknife General purpose, asymmetric distributions

⚠️ The percentile interval can have poor coverage for skewed or biased statistics

The percentile interval assumes the bootstrap distribution has the same shape as the true sampling distribution. When \(\hat{\theta}\) is biased or its distribution is skewed (common for bounded statistics like correlations, proportions, and variances), the percentile interval has actual coverage below the nominal level.

For a correlation near \(\pm 1\), variances, and other bounded parameters, always prefer BCa over percentile. For the sample mean from a symmetric distribution, all four methods give similar results.

💡 Bootstrap CIs in R with the boot package

library(boot)

stat_fn <- function(data, idx) median(data[idx])
boot_obj <- boot(x, stat_fn, R = 5000)

# All four intervals at once
boot.ci(boot_obj, conf = 0.95,
        type = c("perc", "basic", "stud", "bca"))

For the studentized interval, the statistic function must return a vector c(estimate, variance). For BCa, at least \(B = 2000\) is recommended; \(B = 5000\) for stability in the tails.