$\begingroup$

Suppose my genome is 3 million bases and that my reads are 100 nucleotide long. I need to know how many reads I need to cover the entire genome.

I start from using the equation $C = \frac{N \cdot L}{G}$ where C is the coverage, N the number of reads, L the length of a read and G the length of the haploid genome. I also know that the average coverage follows a Poisson distribution, so I need this for C.

On my slides, in fact, I have the following:

$$ C = \frac{N \cdot L}{G} \approx ln \left( \frac{G}{L \cdot \epsilon}\right), $$

if we assume that this is the average coverage needed to cover the entire genome with probability $1-\epsilon$.

The problem is that I do not understand how to arrive at the expression on the right. I thought that if we want to be sure to have a coverage of at least 1 on average, we would have had to compute $1 - P(X=0) = 1 - e^{-\frac{N \cdot L}{G}} = 0.99$, if we take $\epsilon=0.01$. But I do not get the same result as with the equation above.