Since PyCon 2013 I’ve been in a set of conversations that start with “should I be using Python 3.3 for science work?”. Here’s a recent reddit thread on the subject. Last year I solidly recommended using Python 2.7 for scientific work (as many key libraries weren’t yet supported). I’m on the cusp of changing my recommendation.

Update there’s a nice thread on Reddit/r/python discussing what’s required and where the numbers are coming from.

I last looked at the rate of Python downloads via ShowMeDo during 2008 when Python 2.5 was the top dog. The Windows 2.5.1 installer was getting 500,000 downloads a month. In the last 3 months I’m pleasantly surprised to see that Python 3.3 for Windows is downloaded more each month than Python 2.7. We can see:

March 2013 Python 3.3 for Windows has 647k downloads vs Python 2.7 with 630k

February 2013 Python 3.3 for Windows has 553k downloads vs Python 2.7 with 498k

January 2013 Python 3.3 for Windows has 533k downloads vs Python 2.7 with 495k (Python 2.7 less popular since January 2013)

December 2012 Python 3.3 for Windows has 412k downloads vs Python 2.7 with 525k

These figures only tell a part of the story of course. For Windows you have to download Python. On Linux and Mac it comes pre-installed (so we can’t measure those numbers).

Python 2.7 has been the default on Ubuntu for a while, that’s changing with Ubuntu 13.04. There are two lists of Python-3 compatible packages, it seems that Django is on this list and at PyCon there was a how-to-port-to-py3 video (not Flask yet update Armin is tweeting for sprint help for Py3 support), SQLAlchemy is (but not MySQL-python), Fabric isn’t ready yet. For web-dev it seems to be a mixed bag but I’m guessing Python 3 support will be across the board this year.

For scientific use we already have Python-3 compatible numpy, scipy and matplotlib. scikit-learn is ‘nearly‘ ported, Pillow (the recent fork of PIL) is ready for Python 3. NLTK is also being ported.

For scientific use around natural language processing the switch to unicode-by-default looks most attractive (the mix of strings and unicode datatypes has burnt hours for me over the years in Python 2.x). Here’s a PyCon video on the use of Python 3 for text processing and this reviews why Python 3.3 is superior to Python 2.7.

It is slightly too early for me yet to want to switch but I’m starting to experiment. I’ve added some __future__ imports to new code so I know I’m writing Python 2.7 in a 3-like style. I’m also increasingly using Ned Batchelder’s coverage.py via nosetests to make sure I have good coverage. I currently run 2to3 to check that things convert cleanly to Python 3 but rarely run the result with Python 3 (I haven’t needed to do this yet). There’s a set of useful advice on python3porting including various __future__ imports (including division, print_function, unicode_literals, absolute_import).

Ian is a Chief Interim Data Scientist via his Mor Consulting . Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs . He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.