$\begingroup$

Thanks to everyone for the interesting answers. Here's an improved heuristic that makes use of various parts of them -- the basic approach from Matthew, Fixee's observation of the repeated digits, and Numth' observations about restrictions on the digits. It's still a bit off, but I think it may be further improved taking into account Numth' observation 2; I've only taken into account observation 1 (mod $2$ and $5$) so far; I think the mod $3$ part will need to be a bit more subtle; more on that later.

Consider numbers with $m$ digits consisting of $k$ strings of repeated digits. For instance, $38800999$ would have $m=8$ and $k=4$. The number of such numbers is $9^k\left(m-1\atop k-1\right)$: We have $9$ choices for the first repeated digit (we can't use $0$) and $9$ choices for the remaining ones (we can't use the previous one), and there are $\left(m-1 \atop k-1\right)$ different way to divide the repetitions. To check that this is right, we can calculate the total number of numbers with $m$ digits:

$$\sum_{k=1}^m 9^k\left({m-1 \atop k-1}\right)=9\sum_{k=0}^{m-1} 9^k\left({m-1 \atop k}\right)=9(1+9)^{m-1}=9\cdot10^{m-1}\;,$$

which is correct.

Like Matthew, I'll use $1/\log x$ as the "probability" of a number to be prime. This isn't quite right, since this is the total density below $x$, whereas we need the marginal density, $(x/\log x)'=1/\log x - 1/\log^2 x$. I'll do the analytic calculations using just $1/\log x$, which gives an upper bound and is thus good enough to show that the sum converges, but I'll give some numerical results using the more precise formula in the end.

So the probability of one of these numbers being prime is $1/\log x$, which we can bound from above by $1/\log 10^m=1/(m\log 10)$. Now we need the probability that the $k$ different numbers that we can get by deleting one of the digits are also prime. Note that for all but the last digit, the deletion doesn't change the last digit.

One of the main reasons why Matthew's heuristic significantly underestimates the abundance of these numbers is that we need to include a factor of $5/2$ for each digit except the last: For each of these digits, given that the original number is prime and the modified number has the same last digit, we already know it isn't divisible by $2$ or $5$, which raises its chances of being prime by a factor of $10/|\{1,3,7,9\}|=5/2$.

So multiplying all these probabilities, we have one factor of $1/(m\log 10)$ for the original number to be prime, $k$ factors of $1/(m\log 10)$ for all the modified numbers to be prime, and $k-1$ factors of $5/2$ to account for the last digit. (We could bound the probability for the modified numbers by $1/((m-1)\log 10)$, but I want to keep things simple first to derive the convergence; I'll get back to that in the numerical estimates).

Now we have all we need to write an upper bound for the sum of the "probabilities" over all numbers:

$$ \begin{eqnarray} && \sum_{m=2}^{\infty}\sum_{k=1}^{m}9^k\left({m-1 \atop k-1}\right)\left(\frac{1}{m\log10}\right)^{k+1}\left(\frac{10}{4}\right)^{k-1} \\ &=& \frac{9}{\log^2 10}\sum_{m=2}^{\infty}\frac{1}{m^2}\sum_{k=0}^{m-1}\left({m-1 \atop k}\right)\left(\frac{45}{2m\log10}\right)^k \\ &=& \frac{9}{\log^2 10}\sum_{m=2}^{\infty}\frac{1}{m^2}\left(1+\frac{45}{2m\log10}\right)^{m-1} \\ &<& \frac{9}{\log^2 10}\sum_{m=2}^{\infty}\frac{1}{m^2}\mathrm{e}^{45/(2\log10)} \\ &=& \frac{9}{\log^2 10}\left(\frac{\pi^2}{6}-1\right)\mathrm{e}^{45/(2\log10)} \\ &\approx& 19191 \;. \end{eqnarray} $$

Although this is quite a lot bigger than Matthew's result, it's still finite.

The bound is not very tight, for three reasons: I didn't use the fact that the modified numbers have one fewer digit; I dropped the $1/\log^2x$ term; and the exponential bound isn't tight for small $m$. To get a better estimate, let's keep the $1/\log^2x$ term and the power instead of the exponential, and let's roughly estimate the probabilities of the original and modified numbers being prime by using $1/((m-1)\log10)$ for all of them -- that overestimates the probability for the original number and underestimates it for the modified numbers, so should give a better estimate. Putting this all together yields the following estimate for the number of $m$-digit numbers, Fixee's $c(m)$, with $j:=m-1$:

$$9\left(\frac{1}{j\log10}-\frac{1}{(j\log 10)^2}\right)^2\left(1+\frac{45}{2}\left(\frac{1}{j\log10}-\frac{1}{(j\log 10)^2}\right)\right)^j\;.$$

Here's a plot. The numbers are reasonably close to the actual ones, but still too low, even at $m=9$, i.e. $j=8$, where the approximations should be reasonable and the maximum is almost attained, but the value is around 11 whereas the actual values are all around 16.

P.S.: I think I figured out how to correctly take into account the restrictions mod $3$; I'll be posting that later in the day.

P.P.S.: I just realized that part of the remaining underestimation stems from the fact that the last digit should also get a factor of $5/2$ if it's repeated. I think that together with the mod $3$ part might get the result roughly in line with the actual numbers.

[Update:]

I've produced some more numerical data and improved the estimate as described, and the two are now in quite satisfactory agreement.

For taking into account the restrictions mod $2$, $3$ and $5$, the general approach is to take the original number as given, with the generic probability of being prime, and then to analyze the modified numbers under the condition that the original number is prime. We need to distinguish two cases depending on whether the last digit is repeated. The repeated case is easier to analyze, so I'll treat that first.

So assume that the last digit is repeated. That excludes one place at which to divide the repetitions, so there are $9^k\left(m-2 \atop k-1\right)$ such numbers. In this case, each digit, including the last one, gets a factor of $5/2$ to account for the fact that the corresponding modified number is not divisible by $2$ or $5$. To help in getting the more complicated case of a non-repeated last digit right, it's worthwhile stating more explicitly how this factor of $5/2$ arises from a conditional probability. We can consider the probabilities that a number is coprime to all primes in a set $S$ and that it is coprime to all primes not in $S$ as independent, so that the probability of the number being prime is the product of these two probabilities. Then the probability of a number being coprime to $2$ and $5$ is $\lvert\{1,3,7,9\}/10\rvert=2/5$, and the probability of the number being prime is $p=2/5q$, where $q$ is the probability of it being coprime to all primes other than $2$ and $5$. If we estimate $p$ using the overall density of primes but we know that the number is coprime to $2$ or $5$, then the conditional probability of it being prime is $q=(5/2)p$. This is relatively obvious in the present case, but this way of looking at it will be helpful in dealing with the case of a non-repeated last digit.

Now let's look at the mod $3$ restrictions. The last digit is known to be $1$, $3$, $7$ or $9$. If we delete a $3$ or a $9$, the number will still not be divisible by $3$. If we delete a $1$ or a $7$, there is a 50% chance of the number becoming divisible by $3$. Thus, the conditional probability of the modified number being coprime to $3$ is $3/4$, and this has to be divided by the unconditional probability of it being coprime to $3$, which is $2/3$, to obtain the factor $(3/4)/(2/3)=9/8$ by which we need to multiply the unconditional probability estimate.

Now consider the remaining digits, starting from the end. Each digit is different from the one after it, and the one after it is known to be one of the $7$ digits allowed mod $3$. That leaves $6$ allowed digits and $3$ forbidden ones, for a conditional probability of $2/3$. This is equal to the unconditional probability, so we don't need to include any factors to account for the mod $3$ restrictions on the remaining digits.

That completes the considerations for the case of a repeated last digit. Let's call the estimate for the unconditional probability of a number with $m$ digits to be prime $p_m$ (more on that below); then we can write the sum of the probabilities for all $m$-digit numbers with repeated last digit as

$$ \begin{eqnarray} && \frac{9}{8}p_m\sum_{k=1}^{m-1}9^k{\left(m-2 \atop k-1\right)} \left(\frac{5}{2}\right)^kp_{m-1}^k \\ &=& \frac{9}{8}\frac{5}{2}9p_mp_{m-1}\sum_{k=0}^{m-2}9^k{\left(m-2 \atop k\right)} \left(\frac{5}{2}\right)^kp_{m-1}^k \\ &=& \frac{405}{16}p_mp_{m-1}\left(1+\frac{45}{2}p_{m-1}\right)^{m-2}\;, \end{eqnarray} $$

where the orginal sum now only runs up to $m-1$ since the last digit cannot be repeated for $k=m$.

Now let's turn to the slightly more complicated case where the last digit isn't repeated. That reduces both the number of digits and the number of places at which to divide the repetitions, so there are $9^k\left(m-2 \atop k-2\right)$ such numbers. The sum works out as it should, since $\left(m-2 \atop k-1\right)+\left(m-2 \atop k-2\right)=\left(m-1 \atop k-1\right)$.

If the last digit doesn't repeat, we don't know whether deleting the last digit makes the number divisible by $2$ or $5$, and there is correlation between the events of the last deletion leaving the number coprime to $2$ and $5$ and the last two deletions leaving it coprime to $3$. Thus we need to determine the conditional probability that all three of these events occur given that the original number is prime.

Again, the last digit can be $1$, $3$, $7$ or $9$. But now the penultimate digit also has to be one of these, since otherwise the deletion of the last digit would render the number divisible by $2$. Thus we have $16$ combinations for the last two digits, $4$ of which are excluded because the digits cannot be the same. We have two different cases to consider, depending on whether $1$ and $7$ (which have the same remainder mod $3$) are allowed or not. Each of these cases occurs with probability $1/2$, depending on whether the remainder of the original number mod $3$ is $1$ or $2$. (Strictly speaking, the probabilities for these two cases are also slightly correlated with the probabilities being determined, but I believe this correlation should decay quickly as $m$ increases.)

So with probability $1/2$, only $3$ and $9$ are allowed, so the only combinations for the last two digits are $39$ and $93$. In the other case, all $4$ digits, and hence all $12$ combinations, are allowed. Thus, on average $7$ combinations of the last two digits lead to the last two deletions leaving the number coprime to $2$, $3$ and $5$, out of the $4\cdot9=36$ that are possible given that the original number is prime. That yields a conditional probability $7/36$, which we need to divide by the unconditional probability $(\lvert\{1,7,11,13,17,19,23,29\}\rvert/30)^2=(4/15)^2$ of two numbers being coprime to $2$, $3$ and $5$ to get the factor $(7/36)(15/4)^2=175/64$ by which we need to multiply the unconditional probability. All remaining digits contribute a factor of $5/2$ because deleting them doesn't make the number divisible by $2$, and we don't need any correcting factors for the mod $3$ restrictions for the remaining digits, for the same reasons as above, so we're done. Putting it all together, we have for the sum of the probabilities for all $m$-digit numbers with non-repeated last digit:

$$ \begin{eqnarray} && \frac{175}{64}p_m\sum_{k=2}^{m}9^k{\left(m-2 \atop k-2\right)} \left(\frac{5}{2}\right)^{k-2}p_{m-1}^k \\ &=& 9^2\frac{175}{64}p_mp_{m-1}^2\sum_{k=0}^{m-2}9^k{\left(m-2 \atop k\right)} \left(\frac{5}{2}\right)^kp_{m-1}^k \\ &=& \frac{14175}{64}p_mp_{m-1}^2\left(1+\frac{45}{2}p_{m-1}\right)^{m-2}\;, \end{eqnarray} $$

where the original sum starts at $2$ since the last digit necessarily repeats for $k=1$.

The ratio between the two results is $(35/4)p_{m-1}\approx(35/4)/((m-1)\log 10)\approx 3.8/(m-1)$, so for small $m$ there are more numbers with the last digit not repeated and for large $m$ there are more with the last digit repeated, in agreement with the numerical data.

We can bound $p_m$ by $1/\log m$ from above as before, and then show as before that for both cases the sum over the probabilities for all $m$ converges. To get a better estimate, we can take $p_m$ to be the average density of primes for all $m$-digit numbers, which is

$$p_m=\frac{\frac{10^m}{\log10^m}-\frac{10^{m-1}}{\log10^{m-1}}}{10^m-10^{m-1}}=\frac{1}{9\log 10}\left(\frac{10}{m}-\frac{1}{m-1}\right)\;.$$

Here are plots of the repeated case, the non-repeated case and the total. The maxima are at $m=18$, $7$ and $15$ with maximal values of and $21$, $8$ and $26$, respectively. The expected total counts (summed numerically) are $1794$, $209$ and $2003$, respectively. Thus, we can expect there to be around $2000$ of these numbers in total; about half of these have $74$ digits or more.

Here's a table comparing the actual counts for $3$ to $11$ digits (using the numerical data I report further down) to the above estimates:

$$ \begin{array}{|c|c|c|c|} m&\text{last digit repeated}&\text{last digit not repeated}&\text{total}\\\hline\\ \begin{array}{c} \mathrm{\vphantom{actual}\vphantom{estimate}}\\ \\ 2\\ 3\\ 4\\ 5\\ 6\\ 7\\ 8\\ 9\\ 10\\ 11\\ \end{array} & \begin{array}{c|c} \mathrm{actual}&\mathrm{estimate}\\ \hline\\ 0&\\ 1&4\\ 3&6\\ 7&8\\ 12&11\\ 5&13\\ 8&15\\ 11&16\\ 21&17\\ 16&19\\ \end{array} & \begin{array}{c|c} \mathrm{actual}&\mathrm{estimate}\\ \hline\\ 4&\\ 10&6\\ 11&7\\ 9&8\\ 6&8\\ 8&8\\ 6&8\\ 7&8\\ 1&7\\ 7&7\\ \end{array} & \begin{array}{c|c} \mathrm{actual}&\mathrm{estimate}\\ \hline\\ 4&\\ 11&10\\ 14&13\\ 16&16\\ 18&19\\ 13&21\\ 14&22\\ 18&24\\ 22&25\\ 23&25\\ \end{array} \end{array} $$

The agreement turns out to be quite good, though there seems to be a slight overestimation now. I have no explanation for this, since the only systematic effect I can think of that I haven't taken into account is the correlation between the size of the original number and the size of the modified numbers, which should increase rather than decrease the probabilities. (In case you're wondering whether this is a case of changing the theory until it fits the data, it was actually the other way around: I got several incorrect results for the non-repeated case that would have removed the overestimation, but I believe the above analysis is the correct one.)

Here are all the numbers with up to $11$ digits (up to the 4,239,555,920th prime); I checked that they coincide with the ones already reported, but I'm putting them all here to have them together in one place:

23, 37, 53, 73, 113, 131, 137, 173, 179, 197, 311, 317, 431, 617, 719, 1013, 1031, 1097, 1499, 1997, 2239, 2293, 3137, 4019, 4919, 6173, 7019, 7433, 9677, 10193, 10613, 11093, 19973, 23833, 26833, 30011, 37019, 40013, 47933, 73331, 74177, 90011, 91733, 93491, 94397, 111731, 166931, 333911, 355933, 477797, 477977, 633317, 633377, 665293, 700199, 719333, 746099, 779699, 901499, 901997, 944777, 962233, 991733, 1367777, 1440731, 1799999, 2668999, 3304331, 3716633, 4437011, 5600239, 6666437, 6913337, 7333331, 7364471, 7391117, 13334117, 22255999, 33771191, 38800999, 40011197, 40097777, 44333339, 49473377, 79994177, 86000899, 93361493, 94400477, 99396617, 99917711, 110499911, 144170699, 199033997, 222559399, 333904433, 461133713, 469946111, 640774499, 679774391, 680006893, 711110111, 716664317, 743444477, 889309999, 900117773, 982669999, 999371099, 999444431, 1113399311, 1133333777, 1176991733, 1466664677, 1667144477, 1716336911, 2350000999, 3336133337, 3355522333, 3443339111, 3973337999, 4111116011, 4900001111, 6446999477, 6666116411, 6689899999, 6914333711, 7463333477, 8555555599, 8888333599, 8936093833, 9746666477, 10000000097, 10666610333, 11100077711, 11793334733, 19000019333, 23525555899, 30001114937, 33008299999, 33110666399, 33399911777, 37796941199, 40470660977, 44434133339, 46661333333, 46666133333, 55888856893, 61077333377, 66664441373, 66700447613, 66993413393, 71111164499, 77443733111, 99444901133

Some interesting specimens are 10000000097 (which is the only one with lots of $0$s after the first digit, showing that there's no significant effect from the overestimation of the size of the modified numbers in such a case) and the pair 46661333333 and 46666133333.

Here's the Java program I used to find these (using a bit set as Fixee suggested to cram as many primes as possible into my 8GB). The computation took half an hour on a MacBook Pro i7.

public class EdiblePrimes { static class LongBitSet { int [] bits; public LongBitSet (long nbits) { bits = new int [(int) ((nbits + 0x1f) >> 5)]; } public void clear (long bit) { bits [index (bit)] &= ~mask (bit); } public void set (long bit) { bits [index (bit)] |= mask (bit); } public boolean get (long bit) { return (bits [index (bit)] & mask (bit)) != 0; } private static int mask (long bit) { return 1 << (bit & 0x1f); } private static int index (long bit) { return (int) (bit >> 5); } } final static long max = 0x1800000000L; final static long maxIndex = max >> 1; public static void main (String [] args) { LongBitSet composite = new LongBitSet (maxIndex); // bit at index n says whether 2n+1 is composite composite.set (0); // 1 isn't prime int maxDivisor = (int) (Math.sqrt (max) + 1); for (int divisor = 3;divisor < maxDivisor;divisor += 2) if (!composite.get (divisor >> 1)) for (long multiple = (3 * divisor) >> 1;multiple < maxIndex;multiple += divisor) composite.set (multiple); outer: for (long n = 11;n < max;n += 2) { if (!composite.get (n >> 1)) { long power = 1; do { long nextPower = 10 * power; long modified = n % power + (n / nextPower) * power; if (modified != 2 && ((modified & 1) == 0 || composite.get (modified >> 1))) continue outer; power = nextPower; } while (power < n); System.out.println (n); } } }

[Update:]

Since the probability has a simple $k$ dependence, we can calculate the expectation value for $k$, i.e. the expected number of strings of repeated digits. For the shifted sums over $k$, we have

$$ \begin{eqnarray} && \frac{ \sum\left(m-2 \atop k\right) k q^k }{ \sum\left(m-2 \atop k\right) q^k } \\ &=& \frac{ q\frac{\partial}{\partial q}\sum\left(m-2 \atop k\right)q^k }{ \sum\left(m-2 \atop k\right) q^k } \\ &=& \frac{ q\frac{\partial}{\partial q}(1+q)^{m-2} }{ (1+q)^{m-2} } \\ &=& \frac{ (m-2)q(1+q)^{m-3} }{ (1+q)^{m-2} } \\ &=& \frac{m-2}{1+q^{-1}}\;. \end{eqnarray} $$

With $q\approx 45/(2(m-1)\log10)$, this becomes

$$ \frac{m-2}{1+\frac{2(m-1)\log10}{45}}\to_{m\to\infty}\frac{45}{2\log{10}}\approx 10\;. $$

We shifted $k$ down by $1$ and $2$ for numbers with with last digit repeated and not repeated, respectively. Thus, if we don't count a non-repeated last digit as a separate string, we get the same expectation value in both cases, namely $1$ more than the above value. Thus, in the limit of large $n$, there will be about $11$ strings of repeated digits on average (not counting non-repeated last digits), but the values for small $m$ are not near the limit because of the large factor of $45$. Here's a comparison of the actual averages with the estimated expectation values (using the above better estimate for $p_m$):

$$ \begin{array}{|c|c|c|} \hline\\ m&\text{actual}&\text{estimated}\\ \hline\\ 2&1.0&\\ 3&1.9&1.8\\ 4&2.8&2.5\\ 5&3.5&3.1\\ 6&3.8&3.6\\ 7&4.1&4.1\\ 8&4.4&4.5\\ 9&5.3&4.8\\ 10&5.4&5.1\\ 11&5.7&5.4 \end{array} $$

Here, too, the agreement is quite good, but there seems to be a slight systematic error left.