OpenSSL hangs CPU with Python <= 2.7.3 on Windows

If you use Python on Windows and you have programs or servers which allocate a lot of items on the heap (both of which we do), you should upgrade to Python 2.7.4. Especially if you do anything with HTTPS/SSL connections.

Python versions 2.7.3 and below use an older version of OpenSSL, which has a serious bug that can cause minutes-long, CPU-bound hangs in your Python process. Apart from the process taking over your CPU, the symptom we saw was a socket.error with the message “[Errno 10054] An existing connection was forcibly closed by the remote host”. This is because the HTTPS request is opened before the OpenSSL hang kicks in, and it takes so long that the remote server times out and closes the connection.

The cause of the bug is actually quite arcane: the Windows version of OpenSSL uses a Win32 function called Heap32Next to walk the heap and generate random data for cryptographic purposes.

However, a call to Heap32Next is O(N) if there are N items in the heap, so walking the heap is an O(N2) operation! Of course, if you’ve got 10 million items on the heap, this takes about 5 minutes. The first connection to an HTTPS server (which uses OpenSSL) essentially brings Python to a grinding halt for this time.

There’s a workaround: call the ssl.RAND_status() function on startup, before you’ve allocated the big data on your heap. That seemed to fix it, though we didn’t dig too deep to guarantee the fix. We were still running on Python 2.6, and given that the just-released 2.7.4 addressed this issue by using a newer version of OpenSSL, we fixed this by simply upgrading to Python 2.7.4. Note that even Python 2.7.3 has the older version of OpenSSL, so be careful.

Other interesting things we found while hunting down this bug: