$\begingroup$

If a coin is biased to land heads with probability $p$ and $(a,b)$ is a $95\%$ confidence interval for $p$ then $p$ is in $(a,b)$ with probability $95\%$.

Added in edit - While it is often argued that the difference is only philosophical, this distinction is of huge practical importance because the latter phrase is usually interpreted as if $p$ were the random variable, while actually $(a,b)$ is the only random variable if we are in a non-Bayesian setting. In a Bayesian setting, then $p$ is a random variable but the probability that it belongs to the confidence interval depends on the prior. To makes things clearer, let us adapt the first item of Noah's answer to this case.

Imagine the coin is taken randomly uniformly from a jar known to contain coins for which $p=5\%$ and coins for which $p=95\%$ (and assume the coins auto-destruct after being flipped, otherwise we can flip the same coin a number of times; for a more realistic framework, see Noah's answer).

Then, if we know the two possible values (but ignore the distribution of the two kind of coins in the jar), a natural confidence interval $I$ (which I recall is far from being unique) is to take $I=\{0.95\}$ if we observe heads and $I=\{0.05\}$ if we observe tails. We can here replace these singletons by small intervals around the same values if we so decide.

This $I$ is random, as it should: it depend on the random outcome of the experiment. Let us show that this indeed gives a $95\%$ confidence interval. Consider all possible situations and their probabilities, denoting by $a$ the proportion of $p=0.95$ coins:

we drew a $p=0.95$ coin and got heads (odds: $0.95\, a$),

we drew a $p=0.05$ coin and got heads (odds: $0.05\,(1-a)$),

we drew a $p=0.95$ coin and got tails (odds: $0.05\, a$),

we drew a $p=0.05$ coin and got tails (odds: $0.95\,(1-a)$).

Precisely in the first and last case will $I$ contain $p$, and this sums up to a probability of $95\%$: our design for $I$ indeed ensured that in $95\%$ of the cases, the experiment would lead us to choose a $I$ that contains $p$.

Let us go further: if the outcome is heads, we take $I=\{0.95\}$ and the a posteriori probability that $I$ contains the actual value of $p$ is $$ \frac{0.95\, a}{0.95\, a+0.05\,(1-a)}=\frac{0.95\, a}{0.90\, a+0.05}$$ which can be anywhere between $0$ and $1$.

For example, assume that $a=1/1000$. If heads turns up, the (conditional) probability that $I$ contains $p$ is less than $2\%$: the overwhelming prior makes that a "heads" outcome is much more likely to result from a little of bad luck with a $p=0.05$ coin than from a very rare $p=0.95$ coin.

This example might seem artificial, but when a scientist assumes $p$ is in her or his computed confidence interval with $95\%$ probability, as if $p$ where random, it may leads to false interpretations notably in presence of bias in his or her experiments. I thus prefer to say "$I$ lands around $p$ with $95\%$ probability" to make more explicit the randomness of the confidence interval.