How to choose between Python 2 and 3?

The answer is not to choose until your hand is forced

As a Python programmer, I always face the following question when starting a new project :

Should I develop in Python 2 or Python 3?

The answer seems obvious at first glance. Python 3 is the only actively developed branch of the language and support for python 2.7 will be discontinued after 2020. This is compelling reason to develop in Python 3.

However, the decision is not that easy. The availability of third party libraries is also one of my major concerns, so I have to factor that in when making the choice. What could be worse than discovering that there is a package that can do month’s worth of work for me, but can’t be used because I chose the wrong fork of the language?

I know that there are several other major differentiating factors between Python 2 and Python 3 like Unicode handling, Exception handling, import handling, not to mention substantial changes in the standard library. If you are interested, you can read extensive breakdowns of the differences in this pro Python 3 blog post , in this neutral blog post, in this pro Python 2 propaganda and its rebuttal. But in my opinion, the advantages or disadvantages that these factors offer are often dwarfed by the impact of third party libraries that do most of the hard work in a project.

Python 3 compatibility of 360 most popular Python packages as of Dec, 2016.

Thankfully, the situation with libraries isn’t too bad. Most popular Python packages are both python 2 and 3 compatible. The real problem is with the not-so-popular packages that have niche use cases. Ever so frequently these days, I stumble upon excellent third party libraries that were originally written in python 2, but never got ported to python 3. Similarly, there are many libraries written in Python 3 from the get go and which, therefore, don’t really support Python 2.

In an ideal world, I can try to list potentially required packages before starting a project and then, based on their compatibility, make the decision between 2 and 3 . In fact, this is the official advice from the Python wiki.

If you want to use Python 3.x, but you’re afraid to because of a dependency, it’s probably worthwhile doing some research first.

But anyone who has ever managed a real project knows how inconsequential this research would be. Requirements keep changing as a project matures and it is impossible to predict the features you might have to implement later because of a pivot /feature request. Therefore, it is impossible to predict in advance (and stupid to attempt to do this) what packages might be required on the very first day.

And yet, you have to make a decision about the language. You have to make that decision right now. Maybe only to regret it at a later point.

2 or 3? 3 or 2? aargh.

I avoid this whole confusion by insisting that the code base should be both Python 2 and 3 compatible from the get go.

Not just Python 2. Not just Python 3. But both.

I incur a slight bit of overhead by doing this, but not much, because python 2 and 3 are very similar languages. Many changes in Python 3 were backported in Python 2.7. The short length of the official documentation on porting is further proof of how similar the syntax is. Most of the time, the same code works. Once in a while, especially when dealing with Unicode strings, one has to handle things differently based on the Python versions.

I think of this overhead as health insurance premium. You pay a little bit of money every month, so that you don’t have to pay a lot of money when you get really sick. Similarly, you work a tiny bit more than normal by targeting both versions of the language, so that you don’t have to work very hard when you find that one lifesaver package that is incompatible with your supported Python version.

Of course, the comparison with insurance is not totally fair, because the moment you include the first such lifesaver package, it is going to break support for one of the Python versions in your project, and then you are back to square one. So whenever I see such a lifesaver package, I ask myself the following question:

How much time would it require to add support for the unsupported Python version to the lifesaver package?

If this is of the order of a few days, then I create a pull request adding support for the missing version to that project (I have already done this for python-boilerpipe, ark-twokenize-py and currently in the process of fixing Python 3 support for TweeboParser). In return, I become a contributor to an open source project, my project stays both Python 2 and 3 compatible and all is nice and sunny. If the lifesaver package is huge, then this is when I make the big decision to drop support for either Python 2 or Python 3 .

I like this way of doing things because this way, I can take the decision when all the relevant information is available to me. Making the decision at an earlier point would be more like throwing darts in the dark.