Last month we had another instance of our quarterly hackathon. I had an urge to experiment a bit with computer vision, despite not having done anything related before.

Our hackathons are around 48 hours long, which I hoped would be long enough to do some simple facial recognition. My goal was to be able to coarsely symmetrize a face in a real-time by dividing it in half and reflecting it.

The bottom line: I did OK. I didn’t fully achieve my goal by the time the hackathon was over, but with about an hour of extra work I had something that was close to what I had in mind. Here’s a bit about how I did it.

Strategy After spending an hour or so researching computer vision libraries I’d have access to, I settled on OpenCV which appeared to have all of what I needed functionality-wise to accomplish my goal. I even found that OpenCV ships with generated Python bindings, and that there are other projects which wrap OpenCV for Python in various ways . Each of them though used either ctypes or a CPython C extension to do so, neither of which are particularly speedy on PyPy, so to make things a bit more interesting, I decided to attempt CFFI bindings for OpenCV both as an excuse to try a more serious CFFI project and because I hoped that what I could end up with would be fast (on PyPy and CPython) and easy to extend. You can find the "finished" product at https://github.com/Magnetic/opencv-cffi, although as you’ll see in a moment, it only wraps the portions of the API that I needed to do the simple recognition mentioned above. It certainly did turn out to be easy to extend though.

In Which We Begin at the End Let’s take a look at what I ended up with: Note You can find this full example more or less alongside OpenCV-CFFI itself at https://github.com/Magnetic/opencv-cffi/blob/master/example.py If you want to run it yourself, you’ll need the Haar cascade it uses. You can find instructions for downloading it in OpenCV-CFFI’s readme. import sys from bp.filepath import FilePath from opencv_cffi.imaging import Camera from opencv_cffi.gui import Window from opencv_cffi.object_detection import HaarClassifier cascade_filepath = FilePath ( "./haarcascade_frontalface_default.xml" ) classifier = HaarClassifier . from_path ( cascade_filepath , canny_pruning = True ) camera = Camera () def transformed ( frame ): for facetangle in classifier . detect_objects ( inside = frame ): with frame . region_of_interest ( facetangle . right_half ): prettified = frame . flipped_vertical () with frame . region_of_interest ( facetangle . left_half ): prettified . write_into ( frame ) return frame with Window ( name = "Front" ) as front_window : front_window . loop_over ( ( transformed ( frame ) for frame in camera . frames ()), handle_input = lambda key : sys . exit (), ) If you didn’t follow all that, what the above code does is create a GUI window (via OpenCV’s built in GUI framework), then tell the window to stream frames out of a connected camera. For each frame, we find any faces in the frame, then flip the right half and draw it on top of the left half. The technique it uses for facial recognition is a classifier that comes bundled with OpenCV called a Haar Classifier, so I didn’t need to even train the classifier to just get something to run. Not bad for 48 hours of work if I don’t say so myself (6 actual hours more or less when you throw away distractions).

Improvements There are lots of improvements that could be made to the actual "problem" I was trying to solve. If you ran the above example code, you probably saw some jittering in the facial detection. It’s not much jitter, but the classifier has a bunch of parameters that I could tune, including min and max object size thresholds, and a neighbors parameter that controls how confident the classifier needs to be to designate something as a face. I think there probably would be a much larger improvement if, instead of independently detecting faces in each frame, I changed the recognition to instead "follow" a face in successive frames once one was detected. This would probably fix some of the alignment issues when doing the flipping. Searching around a bit there certainly seem to be quite a few examples that use this technique instead. Beyond the particular fun I was trying to have, there’s lots of room for improvement in the bindings themselves. As I mentioned, the bindings mostly implement the particular functionality I needed to do the above, and no more. It’s essentially trivial though to add any other part of OpenCV that I’ve seen so far, which is quite promising. Adding a small Python API on top of that (as I did with some of the objects in the example) would then be helpful to clean things up a bit. There also likely is a lot of room for improvement if I can avoid OpenCV’s memory management, but I haven’t been able to do so yet (have I mentioned how little C I know?). CFFI will better be able to manage garbage collection if I can allocate my own memory entirely. What seems even more promising would be to hook up OpenCV-CFFI with numpypy and to operate on numpypy arrays. There’s some code in OpenCV’s own bindings to translate back and forth between numpy arrays and OpenCV’s array types, but I haven’t yet managed to either reimplement it or get access to it.

Performance Speaking of performance, in my short couple of hours, I haven’t carefully benchmarked (or even sanity checked) the code I’ve written so far, so take any performance numbers with a grain of salt. When running the example discussed above though, and doing a rough FPS calculation, on my 2013 iMac I can get around 20 fps doing the facial recognition, but it seems like this iMac’s camera is limited to 30 fps overall. A faster camera would likely have an even higher overall frame rate after the recognition. If anyone reading this has one, I’d love to know how that goes. Similarly if I moved doing the recognition out of the loop reading the frames, I probably could get that number even higher, but during the hackathon I had trouble reading directly from video devices without using OpenCV’s capture objects. (I briefly looked at v4l and /dev/video but quickly gave up when I couldn’t get them working on either a laptop I have available or on my Nexus 4 phone). Doing direct camera reads is probably necessary to do memory allocation with cffi as well, since I don’t currently see an API in OpenCV that lets me hand it my own memory to put the frame into.

A Few Random Tidbits There were a few quirks I encountered along the way when writing the bindings. This was my first real exposure to CFFI, so there were quite a few CFFI-related things I learned: ffi.gc seems like the right way to attach deallocators to C data, unfortunately though OpenCV’s allocators, at least the ones I’ve used so far, are, e.g., cvCreateImage which returns a IplImage * , but its deallocator is cvReleaseImage, which takes an IplImage ** . ffi.addressof doesn’t exactly do the right thing there for reasons I don’t fully understand, but I’d rather pursue doing my own memory allocation and deallocation with ffi.new .

OpenCV makes a few uses of non-constant C macros. It defines, for example: #define CV_FOURCC_MACRO(c1, c2, c3, c4) \ (((c1) & 255) + (((c2) & 255) << 8) + (((c3) & 255) << 16) + (((c4) & 255) << 24)) a macro it uses to create FourCC‘s, which are codes used to specify video output encodings. These obviously are not functions, so I can’t expose the macros directly via the #define and be able to use them from Python. I could write a wrapper function (in C) that uses the macro, and expose the wrapper function. Helpfully though, CFFI actually seems to be able to do this for me, so I can just write int CV_FOURCC_MACRO(int, int, int, int) and it will detect that I am wrapping a macro and define the appropriate wrapper function. The function is not polymorphic, but for all the cases I had to deal with that was acceptable.

Text processing is annoying. There are a few simple manipulations that I needed to perform on the C source in order to allow completely copying and pasting header files into CFFI without any hand manipulation at all. Besides the above macro calling, removing some macros like CV_DEFAULT(12) (a macro OpenCV uses to declare parameter defaults for C++ but which gets removed in C) means that I had to do some text manipulation on the source I was pasting into ffi.cdef , but I wish it were easier to treat the C-ish source as an AST or the like and to operate more intelligently on the contents. (When I asked a bit about this, I learned that I really should be being a bit more careful about writing the CFFI bindings than just copy/pasting out of the headers, for various reasons, so this is a bit of a longer topic. It still might be nice to have an auto-cffi wrapper generator library.)