Background and Motivation¶

A large fraction of the code that I write has a performance requirement attached to it. Either I'm churning through a large amount of data in an analysis pipeline, or it is part of a real-time system and needs to complete a specific calculation in a constrained amount of time. Sometimes I can rely on numpy, pandas or existing Python packages that wrap C or Fortran code under the covers to get sufficient performance. Often times though, I'm dealing with algorithms that are difficult to implement efficiently using these tools.

Since I started coding primarily in Python ~6 years ago, in those instances I'd typically reach for Cython to either wrap something I or others wrote in C/C++/Fortran or to provide sufficient type information to my code so that Cython could generate a performant C-extension that I could call from Python. Although Cython has been a pretty rock solid solution for me, the amount of boilerplate often required and some of the strange semantic of mixing python and low-level C code often feels less than ideal. I also collaborate with people who know Python, but don't have backgrounds in C and/or haven't had enough experience with Cython to understand how it all fits together.

More and more frequently, I find myself using Numba in instances that I had traditionally used Cython. In short, through a simple decorator mechanism, Numba converts a subset of Python code into efficient machine code using LLVM. It uses type inference so you don't have to specify the type of every variable in a function like you do in Cython to generate fast code. This subset primarily deals with numerical code operating on scalars or Numpy arrays, but that covers 95% of the cases where I need efficient code so it does not feel that limiting. That said, the most common mistake I see people making with Numba is trying to use it as a general Python compiler and then being confused/disappointed when it doesn't speed up their code. The library has matured incredibly over the last 6-12 months to the point where at work we have it deployed in a couple of critical pieces of production code. When I first seriously prototyped it maybe a year and a half ago, it was super buggy and missing a number of key features (e.g. caching of jitted functions, memory management of numpy arrays, etc). But now it feels stable and I rarely run into problems, although I've written a very extensive unit test suite for every bit of code that it touches.

One of the limitations that I do encounter semi-regularly though is when I need some specialized function that is available in Numpy or Scipy, but that function has not been re-implemented in the Numba core library so it can be called in the so-called "nopython" mode. Basically this means that if you want to call one of these functions, you have to go through Numba's object mode, which typically cannot generate nearly as efficient code.

While there is a proposal under development that should allow external libraries to define an interface to make usable in nopython mode, it is not complete and will them require adoption within the larger Scipy/PyData communities. I'm looking forward to that day, but currently you have to choose a different option. The first is to re-implement a function yourself using Numba. This is often possible for functionality that is small and limited in scope, but for anything non-trivial this approach can rapidly become untenable.

In the remainder of this notebook, I'm going to describe a second technique that involves using CFFI to call external C code directly from within Numba jitted code. This turns out to be a really great solution if the functionality you want has already been written either in C or a language with a C interface. It is mentioned in the Numba docs, but there aren't any examples that I have seen, and looking at the tests only helped a little.

I had not used CFFI before integrating it with Numba for a recent project. I had largely overlooked it for two reasons: (1) Cython covered the basic usecase of exposing external C code to python and I was already very comfortable with Cython, and (2) I had the (incorrect) impression that CFFI was mostly useful in the PyPy ecosystem. Since PyPy is a non-starter for all of my projects, I largely just ignored its existence. I'm thankfully correcting that mistake now.