The Grumpy Editor's Python 3 experience

LWN.net needs you! Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing

LWN has been running articles for years to the effect that the end of Python 2 is nigh and that code should be ported to Python 3 immediately. So, naturally, one might expect that our own site code, written in Python, had been forward-ported long ago. Strangely enough, that didn't actually happen. Itmostly happened now, though. In the process of doing this work, your editor has noticed a few things that don't necessarily appear in the numerous porting guides circulating on the net.

One often-heard excuse for delaying this work is that one or more dependencies have not yet been ported to Python 3. For almost everybody, that excuse ran out of steam some time ago; if a module has not been forward-ported by now, it probably never will be and other plans need to be made. In our case, the final dependency was the venerable Quixote web framework which, due to the much appreciated work of Neil Schemenauer, was forward-ported at the end of 2017. Quixote never really took the world by storm, but it makes the task of creating a code-backed site easy; we would have been sad to have to leave it behind.

Much of the anxiety around moving to Python 3 is focused on how that language handles strings. The ability to work with Unicode was kind of bolted onto Python 2, but it was designed into Python 3 from the beginning. The result is a strict separation between the string type ( str ), which holds text as Unicode code points, and bytes , which contains arbitrary data — including text in a specific encoding. Python 2 made it easy to be lazy and ignore that distinction much of the time; Python 3 requires a constant awareness of which kind of data is being dealt with.

In practice, for LWN at least, Unicode is not where the problems arose. The standard advice is to use bytes for encoded strings originating from (or exiting to) the world outside a program, while converting to (or from) str at the boundary, thus using only str internally. That forces a focus on how one is communicating with the environment — a focus that really needs to be there anyway. It is not a hard discipline to acquire, and it leads to more robust code overall.

So text encodings aren't a big challenge except — in your editor's experience — for a couple of places, one of which is the email module, which has proved to be the reason for the most version-dependent code in this particular project. Much of that is due to API changes in that module, most of which are probably justified for proper email handling even if they are annoying in the short term. But there is also the simple problem that one cannot hide the text-encoding issue when dealing with email. It's not just that a message can arrive in an arbitrary encoding: a single message can contain text in multiple encodings — in a single header line. Properly processing such email is arguably easier and more correct in Python 3, but it's different from Python 2 in subtle ways that took a while to figure out.

Another problem has put your editor in a pickle — literally. The Python pickle module is a convenient way to serialize objects, but it has always been loaded with traps for the unwary. Pickle in Python 2 could be relied upon to generate pickles that could be treated as strings, especially if the oldest "protocol" was used. In Python 3, pickles are bytes , and they are not friendly toward any attempt to treat them as strings. Even the "human readable" protocol=0 mode will produce distinctly non-readable output for some types; these include things like NUL bytes that trip up even the relatively oblivious Latin-1 decoder. The datetime type is prone to this kind of problem, for example.

One solution is paint "PICKLES ARE NOT STRINGS" on one's monitor and to resolve never to be so sloppy again. But pickles have other problems, including sometimes surprising behavior when one pickles an object under Python 2, then tries to unpickle it under Python 3, where the definition of the object's class may have changed considerably. Your editor has concluded that pickles are an attractive way to avoid defining a proper persistence mechanism for Python objects, but that taking that shortcut leads to problems in the long run.

Yet another inspiration for high levels of grumpiness is the change in how module importing works. In Python 2, a line like:

import mydamnmodule

would find mydamnmodule.py in the same directory as the module doing the import. That behavior was evidently too convenient to survive into Python 3, so it was taken out. The documentation gives some lame excuse about confusion between modules located this way and standard-library modules, but your editor knows that a more mean-spirited motive must have driven such a change.

Now, one can try to fix such code with an explicit relative import:

from . import mydamnmodule

In many situations, though, that will lead to the dreaded "attempted relative import in non-package" exception that has been the cause of a seemingly infinite series of Stack Overflow postings. Once again, the rules must make sense to somebody, but they make this kind of relative import nearly impossible to use.

So there was nothing for it but to actually get a handle on the namespaces in use and change all the import statements into proper absolute form. Doing so revealed some interesting things. The lazy way in which we had set up our hierarchy was silently causing modules to be imported multiple times — as foo , lwn.foo , and even lwn.lwn.foo , for example — unnecessarily bloating the size of the running program. Such imports can also create difficult-to-debug havoc if any modules maintain module-level state that will also be duplicated and, naturally, become inconsistent.

Moving to well-defined absolute imports fixed those issues, but revealed another that had been hidden: the presence of a number of import loops in the code. These loops, where module A imports B which, in turn (and possibly through several layers of indirection) tries to import A , lead to a "can't import" exception. They are almost always an indication of code structure that, to put it charitably, could use a little more thought. Fixing those required a fair amount of refactoring, profanity, and slanderous thoughts about the Python developers.

The truth, though, is that these issues should have been fixed long ago; the end result of the import change is a much improved code structure here.

Some of the more annoying language changes really do seem like gratuitous attacks on people who have to maintain code over the long term, though. Python 2 did the Right Thing with source files containing both spaces and tabs, for example, while Python 3 throws a fit. The problem is easily fixed, but it seems like it didn't need to be a problem in the first place. Since time immemorial, octal constants have been written with a preceding zero — 0777, for example. Python 3 requires one to write 0o777 instead, for reasons that are not particularly clear. But JavaScript made that change too, so it must be the right thing to do.

At least old-style octal constants will generate a syntax error in Python 3, so there is no chance of subtle problems resulting from those constants being interpreted as decimal. The same is not true of integer division. Python 2 defined integer division as originally intended by $DEITY and implemented by almost every processor: the result is a rounded-downward integer value. So 3/2 == 1 . In Python 3, instead, dividing integers yields a floating-point result: 3/2 == 1.5 . That is a change that could silently create subtle problems. In the LWN code, integer division is used for tasks like subscription management and money calculations; these are not places where mistakes can be afforded.

The fix is easy enough on its face: use // for true integer division. But that requires finding every place that needs to be fixed. Grepping " / " in a large code base is not particularly fun, especially if said code base also includes a lot of HTML. This work has been done, but it is going to take a lot of testing before your editor is confident with the results.

There are numerous other little incompatibilities that one stumbles across, naturally. Some library modules have changed or are no longer present. The syntax of the except statement is different. Dictionaries no longer have has_key() . And so on. Most of these are relatively easy to catch and fix, though — just part of a day's work.

One might wonder about the various tools that are available to help with this transition. The 2to3 tool can be useful for finding some issues, but it wants to translate the code outright, generating a result that no longer runs under Python 2. That is a bigger jump than your editor would like to take; the strategy has very much been to get the code working under both versions of the language before making the big switch. 2to3 also chokes on the Quixote template syntax that is used by much of LWN's Python code. So it was of limited use overall.

An alternative is the six compatibility library, which can be useful for writing code that works under both Python versions. Your editor steered away from six instinctively, though, due to a kernel programmer's inherent dislike for low-level, behind-the-scenes magic. It reworks the module namespace, overrides functionality in surprising places, and requires coding in a version of the language that is neither 2 nor 3. Various versions of six bundled with dependencies have already led to problems even in the Python 2 version of the code. It is better, in your editor's opinion, to have the transitional compatibility code be in one's face, where it can be left behind once the changeover is complete. The increasing number of Python 3 features added to 2.7 make it easier to write portable code, in any case.

All told, the Python 3 transition has been an adventure — one that is not yet complete. It has taken a lot of time that was already in short supply. The end result, though, is cleaner code written in a better version of the language, or so your editor believes, anyway. The Python 2 code base put in over 16 years of service; hopefully the next version will be good for at least that long.

