June 28, 2018 at 06:29 Tags Python

The other day, while playing with a simple program involving randomness, I noticed something strange. Python's random.randint() function feels quite slow, in comparison to other randomness-generating functions. Since randint() is the canonical answer for "give me a random integer" in Python, I decided to dig deeper to understand what's going on.

This is a brief post that dives into the implementation of the random module, and discusses some alternative methods for generating pseudo-random integers.

First, a basic benchmark (Python 3.6):

$ python3 -m timeit -s 'import random' 'random.random()' 10000000 loops, best of 3: 0.0523 usec per loop $ python3 -m timeit -s 'import random' 'random.randint(0, 128)' 1000000 loops, best of 3: 1.09 usec per loop

Whoa! It's about 20x more expensive to generate a random integer in the range [0, 128) than to generate a random float in the range [0, 1) . That's pretty steep, indeed.

To understand why randint() is so slow, we'll have to dig into the Python source. Let's start with random() . In Lib/random.py , the exported function random is an alias to the random method of the class Random , which inherits this method directly from _Random . This is the C companion defined in Modules/_randommodule.c , and it defines its random method as follows:

static PyObject * random_random ( RandomObject * self , PyObject * Py_UNUSED ( ignored )) { uint32_t a = genrand_int32 ( self ) >> 5 , b = genrand_int32 ( self ) >> 6 ; return PyFloat_FromDouble (( a * 67108864.0 + b ) * ( 1.0 / 9007199254740992.0 )); }

Where getrand_int32 is defined directly above and implements a step of the Mersenne Twister PRNG. All in all, when we call random.random() in Python, the C function is directly invoked and there's not much extra work done beyond converting the result of genrand_int32 to a floating point number in a line of C.

Now let's take a look at what randint() is up to:

def randint ( self , a , b ): """Return random integer in range [a, b], including both end points. """ return self . randrange ( a , b + 1 )

It calls randrange , fair enough. Here it is:

def randrange ( self , start , stop = None , step = 1 , _int = int ): """Choose a random item from range(start, stop[, step]). This fixes the problem with randint() which includes the endpoint; in Python this is usually not what you want. """ # This code is a bit messy to make it fast for the # common case while still doing adequate error checking. istart = _int ( start ) if istart != start : raise ValueError ( "non-integer arg 1 for randrange()" ) if stop is None : if istart > 0 : return self . _randbelow ( istart ) raise ValueError ( "empty range for randrange()" ) # stop argument supplied. istop = _int ( stop ) if istop != stop : raise ValueError ( "non-integer stop for randrange()" ) width = istop - istart if step == 1 and width > 0 : return istart + self . _randbelow ( width ) if step == 1 : raise ValueError ( "empty range for randrange() ( %d , %d , %d )" % ( istart , istop , width )) # Non-unit step argument supplied. istep = _int ( step ) if istep != step : raise ValueError ( "non-integer step for randrange()" ) if istep > 0 : n = ( width + istep - 1 ) // istep elif istep < 0 : n = ( width + istep + 1 ) // istep else : raise ValueError ( "zero step for randrange()" ) if n <= 0 : raise ValueError ( "empty range for randrange()" ) return istart + istep * self . _randbelow ( n )

That's quite a bit of case checking and setting up parameters before we get to the next level. There are a couple of fast-path cases (for example, when the stop parameter is not supplied, this function will be a bit faster), but overall after a bunch of checking we get to call the _randbelow() method.

By default, _randbelow() gets mapped to _randbelow_with_getrandbits() :

def _randbelow_with_getrandbits ( self , n ): "Return a random int in the range [0,n). Raises ValueError if n==0." getrandbits = self . getrandbits k = n . bit_length () # don't use (n-1) here because n can be 1 r = getrandbits ( k ) # 0 <= r < 2**k while r >= n : r = getrandbits ( k ) return r

Note that it does a couple more computations and can end up invoking getrandbits() multiple times (esp. if n is far from a power of two). getrandbits() is in C, and while it also ends up invoking the PRNG getrand_int32() , it's somewhat heavier than random() and runs twice as slow.

In other words, there's a lot of Python and C code in the way to invoke the same underlying C function. Since Python is bytecode-interpreted, all of this ends up being quite a bit slower than simply calling the C function directly. A death by a thousand cuts. To be fair, randint() is also more flexible in that it can generate pseudo-random numbers of any size; that said, it's not very common to need huge pseudo-random numbers, and our tests were with small numbers anyway.

Here's a couple of experiments to help us test this hypothesis. First, let's try to hit the fast-path we've seen above in randrange , by calling randrange without a stop parameter:

$ python3 -m timeit -s 'import random' 'random.randrange(1)' 1000000 loops, best of 3: 0.784 usec per loop

As expected, the run-time is somewhat better than randint . Another experiment is to rerun the comparison in PyPy, which is a JIT compiler that should end up tracing through the Python code and generating efficient machine code that strips a lot of abstractions.

$ pypy -m timeit -s 'import random' 'random.random()' 100000000 loops, best of 3: 0.0139 usec per loop $ pypy -m timeit -s 'import random' 'random.randint(0, 128)' 100000000 loops, best of 3: 0.0168 usec per loop

As expected, the difference between these calls in PyPy is small.

Faster methods for generating pseudo-random integers So randint() turns out to be very slow. In most cases, no one cares; but just occasionally, we need many random numbers - so what is there to do? One tried and true trick is just using random.random() instead, multiplying by our integer limit: $ python3 -m timeit -s 'import random' 'int(128 * random.random())' 10000000 loops, best of 3: 0.193 usec per loop This gives us pseudo-random integers in the range [0, 128) , much faster. One word of caution: Python represents its floats in double-precision, with 53 bits of accuracy. When the limit is above 53 bits, the numbers we'll be getting using this method are not quite random - bits will be missing. This is rarely a problem because we don't usually need such huge integers, but definitely something to keep in mind . Another quick way to generate pseudo-random integers is to use getrandbits() directly: $ python3 -m timeit -s 'import random' 'random.getrandbits(7)' 10000000 loops, best of 3: 0.102 usec per loop This method is fast but limited - it only supports ranges that are powers of two. If we want to limit the range we can't just compute a modulo - this will skew the distribution; rather we'll have to use a loop similarly to what _randbelow_with_getrandbits() does in the sample above. This will slow things down, of course. Finally, we can turn away from the random module altogether, and use Numpy: $ python3 -m timeit -s 'import numpy.random' 'numpy.random.randint(128)' 1000000 loops, best of 3: 1.21 usec per loop Surprisingly, this is slow; that's because Numpy isn't great for working with single datums - it likes to amortize costs over large arrays created / manipulated in C. To see this in action, let's see how long it takes to generate 100 random integers: $ python3 -m timeit -s 'import numpy.random' 'numpy.random.randint(128, size=100)' 1000000 loops, best of 3: 1.91 usec per loop Only 60% slower than generating a single one! With 0.019 usec per integer, this is the fastest method by far - 3 times faster than calling random.random() . The reason this method is so fast is because the Python call overheads are amortized over all generated integers, and deep inside Numpy runs an efficient C loop to generate them. To conclude, use Numpy if you want to generate large numbers of random ints; if you're just generating one-at-a-time, it may not be as useful (but then how much do you care about performance, really?)