Extending Python with C: A case study

Near-100x speedup with a C extension

I recently wrote about an algorithm for fuzzy matching of substrings implemented in Python. This is a feature that I needed for a piece of software I'm currently developing.

When I started using the fuzzy_substring function on some test cases, however, it was unacceptably slow. Using a modestly large test corpus and about 1,000 search terms, the function was taking about 30 seconds to run. Since this needs to be run in response to a user query, 30 seconds was just too long to wait.

First attempt

The first thing I tried to get this running faster was psyco. Merely by sticking a psyco.full() at the top of my script, the run time went down to 11 seconds. A 3x speedup for zero effort is pretty cool!

Pscyo rocks, but 11 seconds was still too slow. So, I decided to bite the bullet and write this function as an extension module in C.

The C extension

I'd never written a pure-C extension module before, and hadn't touched C in about 10 years, so it was with some trepidation that I read the official docs on extending and embedding and an online tutorial by Michael Hudson.

It turned out to be incredibly easy. I simply copied an example C file and setup.py from the docs, filled in my method, ran setup.py, and pow! A shiny new subdist.pyd file! Since I'd already written the function in Python, it only took me about an hour to write it in C, and I could use the same unit tests to verify it.

Update: I've set up a Web page for this module with code & binaries.

Benchmark

I ran a benchmark on the C extension versus the pure-Python version (with and without psyco). For the benchmark, I took 1,000 unique words from the Python 2.5 README file as my needles, and the 100th to 200th lines in this same file as my haystacks.

text = unicode ( open ( "/python25/readme.txt" ) . read ( ) )

sentences = text. splitlines ( ) [ 100 : 200 ]

words = list ( set ( text. split ( ) ) ) [ : 1000 ]

I then got the fuzzy substring match for each word against each of my "sentences."

def time_func ( func, words, sentences ) :

start = time . time ( )

for sentence in sentences:

for word in words:

func ( word, sentence )

return time . time ( ) – start

The speedup

Each function was run 10 times; here are the low, high, and median run times:

Low Median High Python 53.6410 s 54.6560 s 55.2190 s Python (psyco) 17.6100 s 17.7960 s 18.0620 s C 0.7030 s 0.7190 s 0.7340 s

Ratios

Python (psyco) to Python: 0.3256

C to Python (psyco): 0.0404

C to Python: 0.0132

So the C extension is almost 100 times faster than the pure Python function, and over 20 times faster than Python with psyco, all for an hour's work. Not too shabby!

There are actually a couple more tricks I could have used in the C code to get a further 10% to 30% speedup (e.g. not calculating the lower left and upper right corners), but it's already fast enough for my purposes. If I need to squeeze out more speed in the future, I'll add the somewhat obfuscating optimizations then.

Conclusion

Compiling extensions in C is incredibly easy, much easier than I expected. Pure C is especially suited to cases like this, where you can immediately get away from Python objects and work with pure C data types. Writing my fuzzy string matching algorithm as a C extension turned an algorithm that was sort of interesting into something that will actually be useful.

It's fantastic to be able to get this kind of speedup on performance bottlenecks, yet still write 99% of my code using pure Python.

Downloads

Downloads are available here. All code and binaries are released under the MIT license.

Usage

from subdist import substring

print substring ( u "needle" , u "Find the needle in the haystack" )

Caveat: The function only accepts Unicode strings, and will return an error if non-Unicode strings are passed in.