March 09, 2013 at 05:41 Tags Python

In a previous post, I demonstrated how to use libffi to perform fully dynamic calls to C code, where "fully dynamic" means that even the types of the arguments and return values are determined at runtime.

Here I want to discuss how the same task is done from Python, both with the existing stdlib ctypes package and the new cffi library, developed by the PyPy team and a candidate for inclusion into the Python stdlib in the future.

With ctypes I'll start with the shared object discussed before; the following code loads and runs it in Python using ctypes . I tested it on Python 3.2, but other versions should work too (including 2.7): from ctypes import cdll, Structure, c_int, c_double, c_uint lib = cdll.LoadLibrary( './libsomelib.so' ) print ( 'Loaded lib {0}' .format(lib)) # Describe the DataPoint structure to ctypes. class DataPoint (Structure): _fields_ = [( 'num' , c_int), ( 'dnum' , c_double)] # Initialize the DataPoint[4] argument. Since we just use the DataPoint[4] # type once, an anonymous instance will do. dps = (DataPoint * 4 )(( 2 , 2.2 ), ( 3 , 3.3 ), ( 4 , 4.4 ), ( 5 , 5.5 )) # Grab add_data from the library and specify its return type. # Note: .argtypes can also be specified add_data_fn = lib.add_data add_data_fn.restype = DataPoint print ( 'Calling add_data via ctypes' ) dout = add_data_fn(dps, 4 ) print ( 'dout = {0}, {1}' .format(dout.num, dout.dnum)) This is pretty straightforward. As far as dynamic language FFIs go, ctypes is pretty good. But we can do better. The main problem with ctypes is that we have to fully repeat the C declarations to ctypes , using its specific API. For example, see the description of the DataPoint structure. The return type should also be explicitly specified. Not only is this a lot of work for wrapping non-trivial C libraries, it's also error prone. If you make a mistake translating a C header to a ctypes description, you will likely get a segfault at runtime which isn't easy to debug without having a debug build of Python available. ctypes allows us to explicitly specify argtypes on a function for some measure of type checking, but this is only within the Python code - given that you got the declaration right, it will help with passing the correct types of objects. But if you didn't get the declaration right, nothing will save you.

How does it work? ctypes is a Python wrapper around libffi . The CPython project carries a version of libffi with it, and ctypes consists of a C extension module linking to libffi and Python code for the required glue. If you understand how to use libffi , it should be easy to see how ctypes works. While libffi is quite powerful, it also has some limitations, which by extension apply to ctypes . For example, passing unions by value to dynamically-loaded functions is not supported. But overall, the benefits outweigh the limitations, which are not hard to work around when needed.

With cffi cffi tries to improve on ctypes by using an interesting approach. It allows you to avoid re-writing your C declarations in ctypes notation, by being able to parse actual C declarations and inferring the required data types and function signatures automatically. Here's the same example implemented with cffi (tested with cffi 0.5 on Python 3.2): from cffi import FFI ffi = FFI() lib = ffi.dlopen( './libsomelib.so' ) print ( 'Loaded lib {0}' .format(lib)) # Describe the data type and function prototype to cffi. ffi.cdef( ''' typedef struct { int num; double dnum; } DataPoint; DataPoint add_data(const DataPoint* dps, unsigned n); ''' ) # Create an array of DataPoint structs and initialize it. dps = ffi.new( 'DataPoint[]' , [( 2 , 2.2 ), ( 3 , 3.3 ), ( 4 , 4.4 ), ( 5 , 5.5 )]) print ( 'Calling add_data via cffi' ) # Interesting variation: passing invalid arguments to add_data will trigger # a cffi type-checking exception. dout = lib.add_data(dps, 4 ) print ( 'dout = {0}, {1}' .format(dout.num, dout.dnum)) Instead of tediously describing the C declarations to Python, cffi just consumes them directly and produces all the required glue automatically. It's much harder to get things wrong and run into segfaults. Note that this demonstrates what cffi calls the ABI level. There's another, more ambitious, use of cffi which uses the system C compiler to auto-complete missing parts of declarations. I'm just focusing on the ABI level here, since it requires no C compiler. How does it work? Deep down, cffi also relies on libffi to generate the actual low-level calls. To parse the C declarations, it uses pycparser. Another cool thing about cffi is that being part of the PyPy ecosystem, it can actually benefit from PyPy's JIT capabilities. As I've mentioned in a previous post, using libffi is much slower than compiler-generated calls because a lot of the argument set-up work has to happen for each call. But once we actually start running, in practice the signatures of called functions never change. So a JIT compiler could be smarter about it and generate faster, more direct calls. I don't know whether PyPy is already doing this with cffi , but I'm pretty sure it's in their plans.