$\begingroup$

If the quantity of interest, usually a functional of a distribution, is reasonably smooth and your data are i.i.d., you're usually in pretty safe territory. Of course, there are other circumstances when the bootstrap will work as well.

What it means for the bootstrap to "fail"

Broadly speaking, the purpose of the bootstrap is to construct an approximate sampling distribution for the statistic of interest. It's not about actual estimation of the parameter. So, if the statistic of interest (under some rescaling and centering) is $

ewcommand{\Xhat}{\hat{X}_n}\Xhat$ and $\Xhat \to X_\infty$ in distribution, we'd like our bootstrap distribution to converge to the distribution of $X_\infty$. If we don't have this, then we can't trust the inferences made.

The canonical example of when the bootstrap can fail, even in an i.i.d. framework is when trying to approximate the sampling distribution of an extreme order statistic. Below is a brief discussion.

Maximum order statistic of a random sample from a $\;\mathcal{U}[0,\theta]$ distribution

Let $X_1, X_2, \ldots$ be a sequence of i.i.d. uniform random variables on $[0,\theta]$. Let $

ewcommand{\Xmax}{X_{(n)}} \Xmax = \max_{1\leq k \leq n} X_k$. The distribution of $\Xmax$ is $$ \renewcommand{\Pr}{\mathbb{P}}\Pr(\Xmax \leq x) = (x/\theta)^n \>. $$ (Note that by a very simple argument, this actually also shows that $\Xmax \to \theta$ in probability, and even, almost surely, if the random variables are all defined on the same space.)

An elementary calculation yields $$ \Pr( n(\theta - \Xmax) \leq x ) = 1 - \Big(1 - \frac{x}{\theta n}\Big)^n \to 1 - e^{-x/\theta} \>, $$ or, in other words, $n(\theta - \Xmax)$ converges in distribution to an exponential random variable with mean $\theta$.

Now, we form a (naive) bootstrap estimate of the distribution of $n(\theta - \Xmax)$ by resampling $X_1, \ldots, X_n$ with replacement to get $X_1^\star,\ldots,X_n^\star$ and using the distribution of $n(\Xmax - \Xmax^\star)$ conditional on $X_1,\ldots,X_n$.

But, observe that $\Xmax^\star = \Xmax$ with probability $1 - (1-1/n)^n \to 1 - e^{-1}$, and so the bootstrap distribution has a point mass at zero even asymptotically despite the fact that the actual limiting distribution is continuous.

More explicitly, though the true limiting distribution is exponential with mean $\theta$, the limiting bootstrap distribution places a point mass at zero of size $1−e^{-1} \approx 0.632$ independent of the actual value of $\theta$. By taking $\theta$ sufficiently large, we can make the probability of the true limiting distribution arbitrary small for any fixed interval $[0,\varepsilon)$, yet the bootstrap will (still!) report that there is at least probability 0.632 in this interval! From this it should be clear that the bootstrap can behave arbitrarily badly in this setting.

In summary, the bootstrap fails (miserably) in this case. Things tend to go wrong when dealing with parameters at the edge of the parameter space.

An example from a sample of normal random variables

There are other similar examples of the failure of the bootstrap in surprisingly simple circumstances.

Consider a sample $X_1, X_2, \ldots$ from $\mathcal{N}(\mu,1)$ where the parameter space for $\mu$ is restricted to $[0,\infty)$. The MLE in this case is $

ewcommand{\Xbar}{\bar{X}}\Xhat = \max(\bar{X},0)$. Again, we use the bootstrap estimate $\Xhat^\star = \max(\Xbar^\star, 0)$. Again, it can be shown that the distribution of $\sqrt{n}(\Xhat^\star - \Xhat)$ (conditional on the observed sample) does not converge to the same limiting distribution as $\sqrt{n}(\Xhat - \mu)$.

Exchangeable arrays

Perhaps one of the most dramatic examples is for an exchangeable array. Let $

ewcommand{\bm}[1]{\mathbf{#1}}\bm{Y} = (Y_{ij})$ be an array of random variables such that, for every pair of permutation matrices $\bm{P}$ and $\bm{Q}$, the arrays $\bm{Y}$ and $\bm{P} \bm{Y} \bm{Q}$ have the same joint distribution. That is, permuting rows and columns of $\bm{Y}$ keeps the distribution invariant. (You can think of a two-way random effects model with one observation per cell as an example, though the model is much more general.)

Suppose we wish to estimate a confidence interval for the mean $\mu = \mathbb{E}(Y_{ij}) = \mathbb{E}(Y_{11})$ (due to the exchangeability assumption described above the means of all the cells must be the same).

McCullagh (2000) considered two different natural (i.e., naive) ways of bootstrapping such an array. Neither of them get the asymptotic variance for the sample mean correct. He also considers some examples of a one-way exchangeable array and linear regression.

References

Unfortunately, the subject matter is nontrivial, so none of these are particularly easy reads.