$\begingroup$

Taking aside the question if you would ever sample the smallest values if the differences between the largest and the smallest values are that big and the smallest values have near-zero probability, the solution to your problem is not that hard as it sounds. Imagine that you have some values

$$ \varepsilon_1, \varepsilon_2,\dots,\varepsilon_n = \log\pi_1, \log\pi_2,\dots,\log\pi_n $$

where $\sum_{i=1}^n \pi_i = 1$, so they can be thought as probabilities. So not to clutter the notation, let's assume that $\varepsilon_1, \varepsilon_2,\dots,\varepsilon_n$ values are sorted in increasing order. Imagine that values $\varepsilon_1, \varepsilon_2,\dots,\varepsilon_k$ are very small, i.e. $\sum_{i=1}^k \pi_i = \xi$ is some very small value, and most of the probability mass is accumulated in the the reminder $\sum_{i=k+1}^n \pi_i \approx 1$.

The simple two-step sampling scheme for such values would be

Define arbitrary pseudo-observation $\delta$ that appears with probability $\xi$. Sample $Y$ from $\{ \delta,\varepsilon_{k+1}, \varepsilon_{k+2},\dots,\varepsilon_n \}$ with probabilities $\xi,\pi_{k+1}, \pi_{k+2},\dots,\pi_n$. If you sampled $Y =\delta$, then sample from $\{ \varepsilon_1, \varepsilon_2,\dots,\varepsilon_k \}$ with probabilities $\exp(\varepsilon_1+c), \exp(\varepsilon_2+c),\dots,\exp(\varepsilon_k+c)$, where $c$ is some constant, such that $\sum_{i=1}^k \exp(\varepsilon_i + c) = 1$, and take the sampled $\varepsilon_i$ value as $Y$.

Finding appropriate $c$ would help you to get rid of the numerical precision issues (you can simply add some large constant to those values and then normalize them). If your $\pi_i$ values do not sum to unity, but you know $\xi$, you can normalize the $\pi_{k+1}, \pi_{k+2},\dots,\pi_n$ by replacing them with $\tfrac{\pi_i}{\sum_{j=k+1}^n \pi_j + \xi}$ values. If $\xi$ is unknown to you, then you would have to make some assumption about it's value, so that enough probability mass is accumulated in the small values.

Notice that this can be easily adapted to case when $\pi_i

ot\in (0,1)$, since you can first sample from some subset of "large" values, treating the reminder as a pseudo-observation as described above. Moreover, you can extend the two-step procedure to multi-step sampling where on each step you would sample some subset treated as pseudo-observation vs the rest of values, where values in each step are normalized so their probabilities sum to unity.

EDIT (2017-02-06):

It seems that you can use Gumbel-max trick to for random generation without leaving the log-space. This basically solves your problem, but since the above answer got up-voted, I'm leaving it.