Writing Forwards Compatible Python Code

For web applications the safest bet currently is to stick with Python 2.x even for new projects. For the simple reason that right now we don't have enough supporting libraries for Python 3 yet and porting some of them over is a huge step. But with all the people telling one that it's hard and painful to upgrade to Python 3, how does one make this upgrade easier?

For high level applications an upgrade is actually quite simple if it can trust the supporting libraries to have consistent behaviour after it's ported to Python 3. In fact there is no reason why an upgrade to Python 3 shouldn't be possible in a painless way. So here is a list of dos and don'ts for writing new Code.

2.6 is your Baseline For new projects, start with Python 2.6 or 2.7. They provide a lot of things that make an upgrade to Python 3 easier for you. If you don't have to support older versions of Python you can already use a lot of the stuff that is in Python 3 by explicitly opting them in. You should use the following things from __future__ : division . I must admit that I hate the future division import in Python 2. It constantly makes me jump to the beginning of the file to check what division mode is active for a module when I do core review. However because it will be the default in Python 3 (and the only mode) you really should be using it nowadays.

. I must admit that I hate the future division import in Python 2. It constantly makes me jump to the beginning of the file to check what division mode is active for a module when I do core review. However because it will be the default in Python 3 (and the only mode) you really should be using it nowadays. absolute_import . The most important one. No longer will from xml import bar import a module foo.xml from the foo package when you are inside it. Instead you explicitly have to do from .xml import bar to get to what you want. Less confusion and incredible helpful. Regarding the print-as-a-function future import, I recommend against using it to avoid confusion. Especially because all editors are currently highlighting it as a keyword it can become confusing quickly. Generally if things behave differently in different files it's a good idea to avoid these things if possible. The great aspect of the print change is that it can be reliably converted with 2to3, so there is really no reason to use the print_function future import. While it might be appealing, better do not use the unicode_literals future import. For the very simple reason that may APIs are changing the supported string types in different places and unicode_literals is counterproductive. There are of course places where this feature import is useful, but that's more limited to lower level interfaces (libraries) and those can't use that import anytime soon anyways because it came with Python 2.6. To get access to the b'foo' iteral you do not need this specific import. That is available either way and is a great help for 2to3.

File IO and Unicode File IO changed greatly in Python 3. Thankfully if you are designing new APIs for new projects you can save yourself a lot of hassle by deciding explicitly for unicode. If you are dealing with text data, use the codecs.open function for opening the files. Assume utf-8 encoding unless explicitly differently defined and operate on unicode strings only. For binary IO make sure to open the file with 'rb' instead of 'r' and you are set. That was required for proper Windows support already anyways. If you are doing byte based data processing mark strings that are bytes only with b'foo' instead of 'foo' which tells 2to3 to not convert these string literals to unicode. Please be aware of the following differences between Python 2.6: >>> b 'foo' 'foo' >>> b 'foo' [ 0 ] 'f' >>> b 'foo' + u 'bar' u'foobar' >>> list ( b 'foo' ) ['f', 'o', 'o'] and Python 3 regarding byte strings: >>> b 'foo' [ 0 ] 102 >>> b 'foo' + 'bar' Traceback (most recent call last): File "<stdin>" , line 1 , in <module> TypeError : can't concat bytes to str >>> list ( b 'foo' ) [102, 111, 111] As a replacement for the above Python 2 idioms, you can use this instead: >>> b 'foo' [ 0 : 0 + 1 ] b'f' >>> b 'foo' + 'bar' . encode ( 'latin1' ) b'foobar' >>> to_charlist = lambda x : [ x [ c : c + 1 ] for c in range ( len ( x ))] >>> to_charlist ( b 'foo' ) [b'f', b'o', b'o'] These will work on both 2.6 and 3.x.

Better Safe than Sorry There are a couple of things where 2to3 will be pretty counterproductive. Some of these are cases where 2to3 seems to have a bug, others are the cases where it just does not know enough of your code to make proper predictions. Recursion Error with str A lot of people are using code like this on Python 2: class Foo ( object ): def __str__ ( self ): return unicode ( self ) . encode ( 'utf-8' ) def __unicode__ ( self ): return u 'Hello World' 2to3 assumes that your API is not unicode compatible and will convert it to this: class Foo ( object ): def __str__ ( self ): return str ( self ) . encode ( 'utf-8' ) def __unicode__ ( self ): return 'Hello World' Now this is just wrong. First of all __unicode__ is unused in Python 3, secondly __str__ now calls into itself and will trigger a runtime error because of recursion when str() is called on an instance of Foo . This can be solved with either a custom 2to3 fixer or a little helper class that makes a check for Python 3: import sys class UnicodeMixin ( object ): if sys . version_info > ( 3 , 0 ): __str__ = lambda x : x . __unicode__ () else : __str__ = lambda x : unicode ( x ) . encode ( 'utf-8' ) class Foo ( UnicodeMixin ): def __unicode__ ( self ): return u 'Hello World' That way your object will still have an __unicode__ attribute on Python 3, but that will not do any harm. When you then want to drop Python 2 support you just have to go over all subclasses of UnicodeMixin and rename __unicode__ to __str__ and remove the helper class. String Comparisons This problem is a little more tricky. In Python 2 the following is true: >>> 'foo' == u 'foo' True Not so in Python 3: >>> b 'foo' == 'foo' False What's worse here is that Python 2 does not emit a warning on comparisons (neither with or without Python-3-warnings flag) and neither will Python 3. So how can you spot these cases? I wrote a small helper module called unicode-nazi which once imported will warn automatically if you do something that is not purely a unicode or bytestring operation: >>> import unicodenazi >>> u 'foo' == 'foo' __main__:1: UnicodeWarning: Implicit conversion of str to unicode True But be aware that this module is very noisy and has a noticeable runtime overhead.

What is a String? Here a table of things that are bytestrings and what they usually become in Python 3: Type Type in Python 3 (unicode == str) identifiers unicode Docstrings unicode __repr__ unicode string keys of dictionaries unicode WSGI environment keys unicode HTTP header values, WSGI environment values unicode , limited to ASCII in 3.1 and limited to latin1 in 3.2 URLs unicode , but some APIs also accept byte strings. Special attention: your URLs have to be encoded in UTF-8 in order to use all of the standard library functions. Filenames unicode or bytes . Most APIs accept both but implicit conversions are not supported. Binary contents bytes or bytearray . Beware: the second type is mutable, so be aware of the fact that you can have a string-ish object that is mutable. Python code unicode . You have to decode the source yourself when you pass it over to exec .