For the last two years, I’ve done almost all of my work in Cython. And I don’t mean, I write Python, and then “Cythonize” it, with various type-declarations etc. I just, write Cython.

I use “raw” C structs and arrays, and occasionally C++ vectors, with a thin wrapper around malloc/free that I wrote myself. The code is almost always exactly as fast as C/C++, because it really is just C/C++ with some syntactic sugar — but with Python “right there”, should I need/want it.

This is basically the inverse of the old promise that languages like Python came with: that you would write your whole application in Python, optimise the “hot spots” with C, and voila! C speed, Python convenience, and money in the bank.

This was always much nicer in theory than practice. In practice, your data structures have a huge influence on both the efficiency of your code, and how annoying it is to write. Arrays are a pain and fast; lists are blissfully convenient, and very slow. Python loops and function calls are also quite slow, so the part you have to write in C tends to wriggle its way up the stack, until it’s almost your whole application.

Today a post came up on HN, on writing C extensions for Python. The author wrote both a pure Python implementation, and a C implementation, using the Numpy C API. This seemed a good opportunity to demonstrate the difference, so I wrote a Cython implementation for comparison:

import random from cymem . cymem cimport Pool from libc . math cimport sqrt cimport cython cdef struct Point : double x double y cdef class World : cdef Pool mem cdef int N cdef double * m cdef Point * r cdef Point * v cdef Point * F cdef readonly double dt def __init__ ( self , N , threads = 1 , m_min = 1 , m_max = 30.0 , r_max = 50.0 , v_max = 4.0 , dt = 1e - 3 ) : self . mem = Pool ( ) self . N = N self . m = < double * > self . mem . alloc ( N , sizeof ( double ) ) self . r = < Point * > self . mem . alloc ( N , sizeof ( Point ) ) self . v = < Point * > self . mem . alloc ( N , sizeof ( Point ) ) self . F = < Point * > self . mem . alloc ( N , sizeof ( Point ) ) for i in range ( N ) : self . m [ i ] = random . uniform ( m_min , m_max ) self . r [ i ] . x = random . uniform ( - r_max , r_max ) self . r [ i ] . y = random . uniform ( - r_max , r_max ) self . v [ i ] . x = random . uniform ( - v_max , v_max ) self . v [ i ] . y = random . uniform ( - v_max , v_max ) self . F [ i ] . x = 0 self . F [ i ] . y = 0 self . dt = dt @cython . cdivision ( True ) def compute_F ( World w ) : """Compute the force on each body in the world, w.""" cdef int i , j cdef double s3 , tmp cdef Point s cdef Point F for i in range ( w . N ) : w . F [ i ] . x = 0 w . F [ i ] . y = 0 for j in range ( i + 1 , w . N ) : s . x = w . r [ j ] . x - w . r [ i ] . x s . y = w . r [ j ] . y - w . r [ i ] . y s3 = sqrt ( s . x * s . x + s . y * s . y ) s3 *= s3 * s3 ; tmp = w . m [ i ] * w . m [ j ] / s3 F . x = tmp * s . x F . y = tmp * s . y w . F [ i ] . x += F . x w . F [ i ] . y += F . y w . F [ j ] . x -= F . x w . F [ j ] . y -= F . y @cython . cdivision ( True ) def evolve ( World w , int steps ) : """Evolve the world, w, through the given number of steps.""" cdef int _ , i for _ in range ( steps ) : compute_F ( w ) for i in range ( w . N ) : w . v [ i ] . x += w . F [ i ] . x * w . dt / w . m [ i ] w . v [ i ] . y += w . F [ i ] . y * w . dt / w . m [ i ] w . r [ i ] . x += w . v [ i ] . x * w . dt w . r [ i ] . y += w . v [ i ] . y * w . dt

The Cython version took about 30 minutes to write, and it runs just as fast as the C code — because, why wouldn’t it? It is C code, really, with just some syntactic sugar. And you don’t even have to learn or think about a foreign, complicated C API… You just, write C. Or C++ — although that’s a little more awkward. Both the Cython version and the C version are about 70x faster than the pure Python version, which uses Numpy arrays.

One difference from C: I wrote a little wrapper around malloc/free, cymem . All it does is remember the addresses it served, and when the Pool is garbage collected, it frees the memory it allocated. I’ve had no trouble with memory leaks since I started using this.

The “intermediate” way of writing Cython, using typed memory-views, allows you to use the Numpy multi-dimensional array features. However, to me it feels more complicated, and the applications I tend to write involve very sparse arrays — where, once again, I want to define my own data structures.