A while ago, I had a need to monitor filesystem modifications, and I looked around for Python bindings for the Linux kernel’s inotify subsystem. At the time, the only existing library was pyinotify, so being a lazy sort, I naturally tried to use it.

On first glance, the documentation seems impressive, and the API looks reasonable. Effective use of inotify is a subtle affair, however, and pyinotify is not, shall we say, the best tool for the job. It’s difficult to tell what those problems might be from external inspection, though, so here are a few notes from my experience.

Correctness

A program using pyinotify can easily lose track of parts of its directory hierarchy. The library doesn’t raise an OSError exception if the inotify_add_watch system call fails: instead, it propagates the -1 error result up to the caller as a value in a dict , but without the value of errno to tell the caller why the error occurred.

It’s thus trivial to miss errors entirely, because the usual mechanism of raising exceptions isn’t used. Almost as bad, it’s impossible to distinguish between recoverable (tried to add a watch on a directory that no longer exists) and fatal (hit the system max_user_watches limit) errors.

Performance

To a regular Python hacker, the interface that pyinotify provides will probably look reasonable. If you want to handle some kind of event, just write a method that will get invoked with an Event object when that event occurs. How reassuringly normal.

Under the hood, though, the implementation is terrible. On every event, the library scans every event that the inotify interface could possibly report, and checks to see if your class implements one of several possible appropriately named methods. This means it’s traversing a 20-element dict , and performing up to 60 attribute lookups (of which up to 40 are based on % -formatted names), for every reported event.

This has disastrous performance implications. If you write a simple monitoring tool that uses pyinotify, use it to monitor activity in a Linux kernel source tree, and then start a build in that tree, try running top while your build runs. When I did this, I found that pyinotify was consuming an entire CPU trying to keep up with the flood of notification events.

Locking

All that needless attribute lookup churn isn’t the only problem: pyinotify uses a threading.RLock to protect every access to every attribute of its Watch class, by providing its own __getattribute__ and __setattr__ methods.

I can’t guess what the author thinks he’s protecting himself from, but he’s got a solid defence mounted against both correctness and performance there. (Blindly locking individual attributes isn’t going to protect the consistency of an entire data structure, and delegating responsibility for locking out to callers, who are probably all single-threaded anyway, might help to recover a bit of the execrable performance. Watch isn’t often on the fast path, thank goodness.)

Is it possible to do better?

A potential rejoinder to my performance criticisms is that Python isn’t a fast language. However, this doesn’t bear up in general: I’ve written plenty of nippy Python code. In this particular case, in response to my mounting horror at reading and fixing the pyinotify source, I wrote bindings of my own. In contrast to pyinotify consuming an entire CPU during moderately heavy filesystem activity, an app using my bindings consumes about 5% of a CPU, even in the face intensive activities like untarring a big file archive.

In part, this is because my bindings are less abstracted than those of pyinotify. I don’t dispatch out to user methods at all; the caller is responsible for checking a bitmask instead. The readability of application code isn’t really affected by this, but stripping out all the cruft massively improves performance.

In addition, the application itself is also responsible for using the library in an informed way. To get decent performance with inotify, you must delay calls to read so that the kernel has a chance to aggregate multiple notifications into a single buffer write. In other words, if a call to poll says “you’ve got events”, you have to wait a good fraction of a second before seeing what they are. I provide a Threshold class to help with this.

While it is certainly possible to call into pyinotify in a similarly informed way, I suspect that all its flab and abstraction will gull the unwary coder into thinking that maybe they’re not writing performance-critical code after all, when in fact they are.

There are other Python inotify interfaces available. One is, like mine, named python-inotify, but a quick glance at its source code revealed some of the same silliness with unnecessary locking that plagues pyinotify, so I quickly averted my eyes. There’s also a Python API to gamin. I have no opinion about it, beyond not wanting to run another daemon if I can avoid it.

My general advice would be to avoid writing code that involves monitoring filesystem activity. It’s all too easy to write code that looks sensible, but is actually racy, usually under circumstances that are difficult to reproduce. Tuning performance without introducing more races or bugs is tough. You’re getting the idea now: hard! scary! find something fun instead!

The corollary to this is, of course, that as a user, you ought to be suspicious of any programs you use that monitor filesystem activity. I bet the Beagle and Google Desktop teams have armloads of horror stories.