Python 2 vs Python 3

“Should I learn Python 2 or Python 3?” For everyone who has just started to learn Python for Data Science, this is an important initial question to answer. There are many ongoing discussions on the topic and you might have found it hard to get a straightforward answer. I also had this question for quite a while – as I want to teach the most relevant Python version here, on the blog. So I’ve decided to reach out to practicing senior Data Scientists and ask their opinion. After several hours of discussions and research I have a definite answer for you. In this article I will summarize my top takeaways.

tl;dr: you should learn Python 3.

Python 2 vs Python 3 – what’s the difference?

To be honest, given that you are not an engineer, but a practicing/aspiring data scientist, you won’t see major differences between Python 2 and Python 3.

Performance-wise

For a long time Python 3 was actually claimed to be slower than Python 2, which might sound odd, I know. Either way, in 2017, that’s not the case anymore. Python 3.7 was just released a few days ago (October 06, 2017) and if the promises are kept, this is gonna be the fastest Python version ever. But don’t put too much emphasis on performance anyway. As Wes McKinney says in Python for Data Analysis:

“As Python is an interpreted programming language, in general most Python code will run substantially slower than code written in a compiled language like Java or C++. As programmer time is often more valuable than CPU time, many are happy to make this trade-off.”

And I fully agree with this idea.

Syntax-wise

There are small, but rather annoying differences. I was using Python 2 for a long time and learning the small changes that were made in Python 3 was a bit unpleasant. But once I got used to the new version, all these new things felt so much more logical. I’ll give you two examples. The first one is how the print statement works. In Python 2:

And in Python 3:

The extra parentheses seem a bit unreasonable in Python 3, but in fact it’s very logical as in Python we enclose every function in parentheses. And why wouldn’t print be a function? (It wasn’t in Python 2.)

Note: actually the syntax print(“Hello, World!”) works with Python 2 as well, but it’s not true the other way around – print “Hello, World!” doesn’t work with Python 3.

Note 2: the cherry on the top of the Python 3 version is that print(“Hello, World!”) is the same syntax used in R.

Another example of the differences is how the two Python versions handle the integer division and the fractional part of the result:

The Python-3-way is much more intuitive. (For me at least.)

If you want to learn more about the specific differences, read this article.

Which one to learn? And why?

It all comes down to this question, right? Python 2 vs Python 3! Who’s the winner?

Previously, I have suggested learning Python 2, because most companies are still using that for legacy reasons. But this is not a strong enough argument anymore!

First off, Python 3 has been around since 2008, and more than 95% of the data science related features and libraries have been migrated already. So it’s already a fully featured language for data science.

Secondly (and more importantly), Python 2 won’t be supported after 2020. This leads to the fact that even those companies who have been using Python 2, have to migrate to Python 3 soon. Thus learning Python 3 will make you more compatible and more valuable for your next job.

And third, Python 3 is a bit more logical and practical in the little details. And since it’s continuously developed, it will be also much better in terms of performance than Python 2.

Note: and reason #4 is that on Data36 all of my Python for Data Science tutorials will be in Python 3. 🙂

At this point, if you are new to Python for Data Science, I think there is no reason to learn Python 2; you should learn Python 3! Invest into the future, not in the past.

If you are still on Python 2…

… consider learning Python 3. I did it, so I can tell you from personal experience: it’s not a big deal. But if you can’t do it, because you are relying on very special Python 2 libraries that are not migrated to Python 3 yet, or you are constrained by your company code-base, I still recommend preparing your code for Python 3.

There is a very nice project called Python Future that offers a set of libraries with which to do that!

Conclusion

I didn’t mean to be too dramatic here! As I said above, the difference between Python 2 and Python 3 is not that big at all! Whichever you choose, you can learn the other one in a matter of hours. But in 2017, the winner of the Python 2 vs Python 3 battle is clearly Python 3. So if you can choose which one to learn, choose that!

With that being said, come and continue learning Python for Data Science.

If you want to learn more about how to become a data scientist, take my 50-minute video course: How to Become a Data Scientist. (It’s free!)

Also check out my 6-week online course: The Junior Data Scientist’s First Month video course.

Cheers,

Tomi Mester