Javaism, Exceptions, and Logging: Part 1

I'm working nowadays on refactoring a large Python codebase at my workplace, and I wanted to share some of my insights over two some aspects of large-scale projects: exceptions, logging, and a bit on coding style. Due to it's length, I decided to split it over three installments; the first covers Javaism and an introduction to the issues of working with exceptions. Part 2 suggests "best practices" concerning exceptions in Python, and part 3 will cover logging (when, how, and how much).

Javaism

From my long experience in the programming world, I get the feeling that many programmers (even those fluent in Pythonspeak) come from a rich Java/.NET background, where they've acquired their programming skills and mind-set. Now converted-Pythonists, they are still the "speakers" of a second language, and they can't deny their mother tongue. In the context of this post, I'll refer to this as Javaism, or thinking Java in Python. Of course it might as well be C# or C++, but Java is the umbrella term.

You don't have to go far to see examples of it, for Javaism didn't skip Python's standard library: modules/packages such as logging , unittest and threading where ported almost isomorphically from Java. On the surface, you might encounter camelCase names ( getLogger ), but the verbosity and over-complicated nature of Java and its inheritance methodology can be seen anywhere. For instance, recall the complexity of setting up a logger (I have to look it up every time), or the threading.Thread class... I really don't wish to digress here, but I feel that a concrete example would make this point clear:

The canonical way to write thread functions is by subclassing Thread and implementing run() ; you can pass a callback (called target), but it seems like an afterthrought (for once, it's not the first argument of the constructor). Recall that anonymous classes weren't always there in Java, so extending a class was the shortest way to emulate first-class functions.

Speaking of the Thread constructor -- it takes so many optional arguments ( group ?!), but none for daemon . For that, you have to imperatively call setDaemon() afterwards. Why? Because Java doesn't have keyword arguments, so by adding an additional argument you double the number of overloaded constructors (exponential growth), while get/set properties behave "linearly".

Also, you first instantiate the thread object, then start() it... where's the sense in that? What can you do with an unstarted thread object (other than calling setDaemon )? Consider Popen or file() -- the one, obvious way that Python follows is resource instantiation is acquisition. Why introduce useless states in the lifetime of the object? You just lay the ground for unexpected code flows that may result in bugs. What can you do with an unstrarted thread? Pass it on to someone else, who will start it. That's a very general concept, called partial application, or currying. Python has lambda functions and partial , but Java doesn't... The easiest way to create such "deferred" objects is to add "deferredness" as an internal state of the object -- it's a common Java practice, uncalled for in Python.

It's funny how the limitations of Java were ported to Python as well, and made the implementation ugly -- it's a classical case of "I don't want to think, let's just copy an existing solution". Luckily, it seems to be a thing of the past -- it only entered stdlib in the old days, before the community had a clear notion of what being pythonic meant. But still, Javaism of all degrees is widespread, especially in corporate-developed large-scale projects (Zope and twisted, to name a few, but naturally closed-source corporate-internal projects are even more susceptible).

Exceptions

Java had a good insight (that they most likely stole from some other language) in that exceptions are part of a function's signature. Just like a function takes an argument of type T1 and returns a result of type T2, it also has "side channels" through which it can produce results, known as exceptions, which should be documented and checked as well.

The problem is, trying to foresee everything that might ever go wrong is a futile attempt, and even Java itself makes two exceptions (no pun intended): Error , for unrecoverable exceptions (such as VirtualMachineError ), and RuntimeException , for exceptions that may always occur (such as NullPointerException ).

The fundamental idea is correct, but it was bound to fail: first, because people are lazy, but most importantly, because trying to predict all unexpected, future edge cases is absurd. For instance, suppose you're implementing an interface that stores data (say, in files), so you might find yourself implementing a signature such as void write(byte[] data) throws IOException . Now suppose your implementation uses a third-party database engine, that throws MySQLException . For obvious reasons, MySQLException does not derive from IOException , and there's nothing you can do about it, as both the interface and the DB engine are given to you. You're now faced with three options:

Translate MySQLExceptions into IOExceptions When designing interfaces, always declare the most general exception in throws clauses When implementing libraries, always derive your exceptions from an unchecked exception ( RuntimeException )

In short -- you need to find a workaround and by-pass the compiler's checking. This essentially means that throws should have served for documentation-only purposes, where the compiler might produce (suppressible) warnings should you not follow conventions. It's more of a semantic property, like idempotence or thread-safety... you may state it in your Javadoc, but you wouldn't expect the compiler to enforce that (not in a language like Java, anyway).

I'd guess most people agree that the second and third options are "inherently bad", but opinions diverge on the first one. I will try to show that exception-wrapping (translating exceptions) is just as bad -- at least when it comes to Python. We'll cover this in part 2.