Benchmarking Parallel Python

Writing about web page http://www.artima.com/weblogs/viewpost.jsp?thread=214303

This post is Bruce Eckel’s follow-up to his previous post which covered, among other things, concurrency within Python. Basically, CPython has the Global Interpreter Lock (GIL) which makes life very awkward for those wanting to run Python on more than one processor.

Anyhow, in this post Bruce points to Parallel Python as an add-on module which is a potential solution. I had a look at this and thought it was pretty cool. However, bearing in mind Guido van Rossum’s post about the performance implications of removing the GIL last time it was attempted I thought I’d see if this actually did provide a speed-up and benchmark it.

The following stats are for calculating the sum of primes below every multiple of 10000 between 105 and 106 (including the lower bound and excluding the upper). The first set uses only one working thread0 of my Core Duo laptop and the second set uses two (as I have two processors).

It should be noted that the code snippet being used is provided as an example on the Parallel Python website and so is probably one of their most optimal cases. Regardless, I think the numbers are helpful.

One Processor

Real Time Taken: 1153.53128409 s

Number of jobs: 90

Total Job Time: 1153.53128409 s

Time/Job: 12.816742

Two Processors

Real Time Taken: 601.201694012 s

Number of jobs: 90

Total Job Time: 1180.9738

Time/Job: 13.121931

It can be seen that running two worker threads increases the actual CPU time used by around 30 seconds but the fact that two processors are being used leads to a total speed up factor of 1.918709304, which is pretty impressive.

—

0 I’m not sure of the internals, so I don’t know if it is technically a thread. Regardless, only one calculation will happen at a time.