Why all of science should use Python

Bit of a nerdy one today, and not one relating to anything in particular. It’s just that over the past months i’ve become increasingly obsessed with the Python programming language. It’s just beautiful – I’m not going to go into much details as they can be found on python.org, but in short python can be described as executable pseudocode. For instance, the classical hello world program consists of:

print 'hello world'

And then there’s the Zen of Python . This tells you everything you need to know: python is a language designed for ease of use, to appear sane to anyone who has to read it.

So why is this so important for science?

Well for one thing it’s just a great tool for everything. And I mean everything – it can be used just as well to process data, create optimization code, create control systems and GUIs, perform algebra, do stats, access databases on and offline, and even create web pages by using the right modules and frameworks. It is a general purpose language, unlike things like Matlab or Mathematica. While these and others do offer benefits from being more specialized, often the benefits of being able to learn one language outweigh these benefits. Even better, because all the modules are python, you can tie in different parts of your scientific code to, for example, use one module to get data from a database, another to process it, another to create nice graphs, and even to publish the results.

There are plenty of scientific modules out there and they are improving all the time, Numpy and Scipy being the most important basis for dealing with lots of numbers. In fact the only problem is that there can be quite a lot of modules which do similar things, some of which are more up to date and useful than others, meaning a choice needs to be made. In that regard, scientific python packages like EPD can help a lot.

But what i really want to talk about is more than how nice the language is. It is nice, and does have tools available for more or less every scientific need. But more importantly than that, it is free – both as in beer and as in freedom. Since it has no cost and is open source, it is accessible to everyone. Anyone can install python and run someone else’s code, and they don’t have to pay for a very expensive propriety software license to do so. This is crucial for being able to reproduce scientific results. In business it doesn’t really matter if people use a costly tool since everything stays internal. In research, results need to be reproducible. More and more research depends on work done using computer code, one way or another. If people don’t have the costly software needed to run that code, then they are prevented from running that experiment in the exact way it was originally done.

By using python code, anyone can get the same data analysis running. Thanks to how simple python is to use and how easy it is to read, the barrier for someone else trying to understand the code is much lower than for many other languages.

This is also an equality issue – while wealthy institutions may bite the bullet and buy licenses for their academics, many simply cannot afford to, especially in developing countries. It is not fair to discriminate against scientists around the world simply because they can’t buy their way into the computer club.

It is also of great benefit to students for them to learn to use a free tool rather than a proprietary one. When they leave, they will not be locked into using an expensive piece of software. Their skills will always be useful to them, because even if their workplace doesn’t provide a programming environment, they can just use a free python-based one.

Finally, python could bring a measure of standardization to scientific coding practice. I have witnessed some very dodgy ad-hoc setups in labs, the sort of thing that would make a business project manager cry: bits of code in one language using other bits of code in a different language via various hacks, all because everyone knew a different language an no one could be bothered to coordinate anything. Plus the languages used were all to difficult or inflexible to allow newcomers to usefully improve them.

So I urge everyone who can to use python for their scientific work. I say this also hoping this will encourage improvement in the python ecosystem. There do remain a couple of warts, in particular packaging distributing python programs can be a bit of a pain especially if you require specific modules for your program to work. I would really like to see the emergence of standard practice for python in science: conventions on organization and naming and publishing of code. I would love to see something like a Github for Science, where teams could publish and manage their code and refer to it in published papers (hmm, like that so much it might get it’s own post!).

In waiting for all this magic to happen, do yourself a favor and program in python!

Nice tip: If you’re on Linux of OSX, open a terminal and type:

python (This will open a python prompt.)

Then type

import this

– quit python by typing quit()