Shortly after my post about speeding up Python with Cython, I was contacted by Mark Dufour, creator of ShedSkin, a Python-to-C compiler, who wanted to try my code with his compiler. I had heard of ShedSkin before, but I chalked it up as something to try later, or something too hard to try (C is not my forte).

After Mark contacted me, I decided to give it a go on the code of the post, and, to my great, surprise, it performed a bit better than Cython with no changes to my code. ShedSkin does require that you program in a restricted subset of Python, but most of my scientific code is written in that style anyway (it’s not really that restricting). After that point, I used ShedSkin for all my other assignments, and now I’m writing about it.

A few days ago I had a bioinformatics assignment, and the goal was to recognize protein location from their structure. I wrote an SVM to classify the proteins, compiled it with ShedSkin and ran it. I will give you a sample of the Python code and the same code modified for ShedSkin.

Before:

def train_adatron ( kernel_matrix , label_matrix , h , c ): tolerance = 0.5 alphas = [([ 0.0 ] * len ( kernel_matrix )) for _ in range ( len ( label_matrix [ 0 ]))] betas = [([ 0.0 ] * len ( kernel_matrix )) for _ in range ( len ( label_matrix [ 0 ]))] bias = [ 0.0 ] * len ( label_matrix [ 0 ]) labelalphas = [ 0.0 ] * len ( kernel_matrix ) max_differences = [( 0.0 , 0 )] * len ( label_matrix [ 0 ]) for iteration in range ( 10 * len ( kernel_matrix )): if not iteration % 100 : print "Starting iteration %s ..." % iteration for klass in range ( len ( label_matrix [ 0 ])): max_differences [ klass ] = ( 0.0 , 0 ) for elem in range ( len ( kernel_matrix )): labelalphas [ elem ] = label_matrix [ elem ][ klass ] * alphas [ klass ][ elem ] for col_counter in range ( len ( kernel_matrix )): prediction = 0.0 for row_counter in range ( len ( kernel_matrix )): prediction += kernel_matrix [ col_counter ][ row_counter ] * \\ labelalphas [ row_counter ] g = 1.0 - (( prediction + bias [ klass ]) * label_matrix [ col_counter ][ klass ]) betas [ klass ][ col_counter ] = min ( max (( alphas [ klass ][ col_counter ] + h * g ), 0.0 ), c ) difference = abs ( alphas [ klass ][ col_counter ] - betas [ klass ][ col_counter ]) if difference > max_differences [ klass ][ 0 ]: max_differences [ klass ] = ( difference , col_counter )

After:

def train_adatron ( kernel_matrix , label_matrix , h , c ): tolerance = 0.5 alphas = [([ 0.0 ] * len ( kernel_matrix )) for _ in range ( len ( label_matrix [ 0 ]))] betas = [([ 0.0 ] * len ( kernel_matrix )) for _ in range ( len ( label_matrix [ 0 ]))] bias = [ 0.0 ] * len ( label_matrix [ 0 ]) labelalphas = [ 0.0 ] * len ( kernel_matrix ) max_differences = [( 0.0 , 0 )] * len ( label_matrix [ 0 ]) for iteration in range ( 10 * len ( kernel_matrix )): if not iteration % 100 : print "Starting iteration %s ..." % iteration for klass in range ( len ( label_matrix [ 0 ])): max_differences [ klass ] = ( 0.0 , 0 ) for elem in range ( len ( kernel_matrix )): labelalphas [ elem ] = label_matrix [ elem ][ klass ] * alphas [ klass ][ elem ] for col_counter in range ( len ( kernel_matrix )): prediction = 0.0 for row_counter in range ( len ( kernel_matrix )): prediction += kernel_matrix [ col_counter ][ row_counter ] * \\ labelalphas [ row_counter ] g = 1.0 - (( prediction + bias [ klass ]) * label_matrix [ col_counter ][ klass ]) betas [ klass ][ col_counter ] = min ( max (( alphas [ klass ][ col_counter ] + h * g ), 0.0 ), c ) difference = abs ( alphas [ klass ][ col_counter ] - betas [ klass ][ col_counter ]) if difference > max_differences [ klass ][ 0 ]: max_differences [ klass ] = ( difference , col_counter )

You might notice that the two snippets are identical. That’s how awesome ShedSkin is. It didn’t need a single change, and on top of that, it gave me compile-time errors when I messed up my code.

The timings of the pure Python and ShedSkin compiled code are:

python shedskin ------------- ------------ 4841.94 sec 103.30 sec

You can find my code in the ShedSkin repository.

That is a 47x speedup (not 47%, 47 times), just by running two commands to compile my code to C and C to machine code. Needless to say, I will be using ShedSkin a lot more in the future.