When I first started building Pinecast last year, I chose the best tools that I had at my disposal at the time:

Python

Django

Postgres

I used Postgres because I’ve had great success with it in the past. I chose Django because I wanted a framework with a built-in ORM and I’d used it for some very successful projects in the past. Django runs in Python, and at the time, the only version of Python that would support the other libraries I needed was Python 2.7.

I won’t get into the discussion of whether Python 2.7 is superior or inferior to Python 3, but I’ll note these few things:

I would prefer to use Python 3. I like running on software that has active momentum behind it. Python 2.7, for all intents and purposes, has no momentum.

I like running on software that has active momentum behind it. Python 2.7, for all intents and purposes, has no momentum. I have never worked on a successful Python 3 project. This is not the direct fault of the Python community, but it means that I do not have enough life experience to know which rough edges I’m going to rub up against.

This is not the direct fault of the Python community, but it means that I do not have enough life experience to know which rough edges I’m going to rub up against. Dependencies are the biggest problem. If you rely on a package that isn’t Python 3 friendly, you’re just out of luck. Your project will remain in Python 2.7 limbo indefinitely, or until someone upgrades your dependency. Pray that the dependency is maintained.

If you rely on a package that isn’t Python 3 friendly, you’re just out of luck. Your project will remain in Python 2.7 limbo indefinitely, or until someone upgrades your dependency. Pray that the dependency is maintained. Upgrading from 2.7 to 3 is painful. I think this is inarguably the truth. Using ES2015 instead of ES5 simply means setting up some boilerplate in your project root. Using PHP7 instead of PHP5.6 simply means making sure you didn’t do anything utterly disgraceful with your code. Using CSS3 instead of CSS2.1 means you don’t need to do anything. Using Python 3 instead of Python 2.7 means you’re in for a world of hurt, with more code meaning more hurt.

For the first two reasons, I went with Python 2.7 from the start. I needed something to work ASAP, and I didn’t want to use Pinecast as an excuse to fumble my way into a production Python 3 project.

So what was blocking Pinecast from upgrading after it was settled and running in production? For a very long time, we were blocked by some dependencies:

grequests: In order to parallelize some external requests to our analytics provider we used grequests, which is something of a wrapper around the amazing requests library. grequests uses gevent, which only got Python 3 support around the time Pinecast was gaining momentum. grequests is still not fully Python 3 compatible.

In order to parallelize some external requests to our analytics provider we used grequests, which is something of a wrapper around the amazing requests library. grequests uses gevent, which only got Python 3 support around the time Pinecast was gaining momentum. grequests is still not fully Python 3 compatible. django-postgrespool: This library does some fiddly database magic to pool connections to Postgres. Including this was recommended by Heroku as part of their Django template at the time (which I borrowed my DB code form). After certain errors started appearing, I removed it, as did Heroku. Ironically, this package is made by the same author as grequests.

django-postgrespool was removed from Pinecast in January. grequests lingered until a couple weeks ago: Pinecast dropped the analytics vendor that we had used and switched to an InfluxDB instance running on AWS. Using the influxdb Python package has been a dream, and made grequests unnecessary. Removing it decreased our build and test times, and unblocked me from using Python 3!

Upgrading

As I mentioned above, upgrading from Python 2.7 to 3 is an ugly and painful process. I spent a whole morning on it, which might not seem like a lot of time, but as I’ll explain later, it could have been much worse if I had not been proactive.

My first stab at this was running PyLint:

pylint --py3k

This gave me a litany of errors, and they boiled down to this:

Using absolute imports with from __future__ import absolute_import and fixing all the subsequent import errors.

and fixing all the subsequent import errors. Stop using unicode() and replace it with str() everywhere. This was problematic for detecting unicode strings in some places (e.g., isinstance(foo, (str, unicode)) ), which led me to turn to types.StringTypes , though this was removed in Python 3. I created my own StringTypes by wrapping (str, unicode) in a try block and catching NameError where I then redefine it as (str, ) . Gross, but it does the job well.

and replace it with everywhere. This was problematic for detecting unicode strings in some places (e.g., ), which led me to turn to , though this was removed in Python 3. I created my own by wrapping in a block and catching where I then redefine it as . Gross, but it does the job well. print() instead of print .

instead of . Somehow I still had some code that used cmp= on sorted() .

on . True division with from __future__ import division . This was a real hassle because it’s very difficult to know what kind of division a piece of code needs to use. Thankfully, Pinecast doesn’t do much division.

. This was a real hassle because it’s very difficult to know what kind of division a piece of code needs to use. Thankfully, Pinecast doesn’t do much division. There was a bunch of code to handle long in certain weird edge cases, but I don’t think it’s ever run in the history of the project. I just removed it.

in certain weird edge cases, but I don’t think it’s ever run in the history of the project. I just removed it. Replacing xrange() with range() , which was thankfully straightforward.

The next step was using modernize to find other glaring issues:

All the built-ins (and many methods on built-in types) return iterators now, which means my code needed to handle that.

Modernize assumes you’re not iterator friendly at all, so it pulls in six.moves 's zip and map and more. In most cases, these were unnecessary, or I was happier to slap a list() on the code rather than use a different, magical function.

In the end, I ended up ignoring almost all of Modernize’s changes and wrote my own. It was a useful tool, though, and I recommend using it to catch some weird edge cases.

Python’s docs recommend using the future package also, but after using PyLint and Modernize, I found that it produced nothing terribly actionable.

Next I went through and set up some tests, got Travis to test with 3.5.2, and added caniusepython3 to my Makefile so the tests will refuse to pass if I add an incompatible dependency. I also added a runtime.txt file to tell Heroku to run the app on Python 3.

I played with 2to3 for about a half hour, but was entirely unsuccessful at getting it to do anything meaningful. For one, 2to3 undid a bunch of changes that pylint and modernize suggested, which isn’t great (especially since I’m using pylint to detect obvious compatibility regressions). For instance, it removed all of the from __future__ import … statements from the codebase. It also has poor heuristics and created a ton of noise. As an example:

select = ', '.join(

select_format(k, v) for

k, v in

self.selection.items()) # became select = ', '.join(

select_format(k, v) for

k, v in

list(self.selection.items()))

I’ve made the addition bold. Essentially, this change only makes the code worse. Yes, it preserves the behavior of Python 2.7, where the output of .items() is a list. But since the result is used directly in a comprehension, it doesn’t matter whether it’s a list or an iterator, and creating a list adds totally unnecessary overhead. The same happens in other places as well:

- for ep_id, (pod_id, count) in ep_listens_before.items():

+ for ep_id, (pod_id, count) in list(ep_listens_before.items()): - list(reversed(sorted(top_ep_data.items(), key=lambda x: x[1])))[:25]

+ list(reversed(sorted(list(top_ep_data.items()), key=lambda x: x[1])))[:25]

It’s entirely possible to detect where adding a list() is unnecessary (and I assume Modernize does this, since its output was not nearly as noisy).

By default, 2to3 automatically stripped the unicode flag from my strings, despite PEP 414.

It seems, from this experience, that 2to3 is meant for folks with a one-way ticket to Python 3. I expected far more from the Python community on this one. The fact that it doesn’t generate Python 2-friendly code out-of-the-box is a deal breaker for me, and the fact that it generates so much noise for completely unnecessary “fixes” left a very sour taste in my mouth.

Problems found manually

Actually running the code led to me find some fun problems:

email.Utils is just gone, and email.utils is there instead. It would have been helpful to get a warning; I assume this was deprecated in Python 2.7 and removed in 3 (my code referenced both), and having 2.7 warn about this would have been great. It’s also curious that none of the tools that I used caught this one, since it’s a standard library change. hashlib 's hashing methods now only accept byte arrays. The same is true for hmac . base64.b64{en|de}code only accepts and returns byte arrays. '%x' % foo will no longer cast a float to int . So that was a problem, and was easily fixed. A warning from PyLint would have been great on this one: “Hey, you’re using %x and not explicitly casting the value you’re jamming in there, you should consider it.”

Push it to production, what could go wrong

After clicking around through the bulk of the site, I pushed the changes to production and opened the Rollbar dashboard to see the errors start to roll in. There were surprisingly few. Here were the surprises:

Python 3 wonked up urllib by reorganizing it into sub-libraries. urllib.quote became urllib.parse.quote . urllib.pathname2url became urllib.request.pathname2url . Why? We may never know, especially with urllib2 being a thing. Regardless, I don’t know why this ended up being a runtime error and wasn’t caught by any of the tools I used. dateutil stopped parsing ISO-8601 timestamps with a trailing “Z”. I didn’t look too deeply into why, but it just doesn’t parse anymore. Easy fix, but took some debugging. Despite thinking I’d caught all the edge cases, I still found problems around base64 , hashlib , and hmac . I’m using the lovely itsdangerous library by Mitsuhiko for generating upload URLs. It’s quite good, but the Python 3 support is a bit thin. Here’s the only real piece of the documentation that explains anything useful:

On Python 3 the interface that itsdangerous provides can be confusing at first. Depending on the internal serializer it wraps the return value of the functions can alter between unicode strings or bytes objects. The internal signer is always byte based.

Emphasis mine. I realize itsdangerous isn’t a super widely-used package, but this isn’t an excellent story for developers, especially for something as important as Python 3.

Thoughts on Python 3

If anyone is unsure about why Python 3 adoption has been slow, just look to my experience here. The hassle I went through here thankfully wasn’t too terrible but it could have been a lot better, especially around issues like urllib and email renaming. I think the reason why this wasn’t more painful is twofold: I had been planning to upgrade from the get-go and avoided choices that would have blocked that, and Pinecast has a relatively small codebase. Neither of those things are likely to be true for most other startups where velocity is high or the product has a larger footprint.

So what went well?

The number of changes I had to make overall was fairly small. I don’t think I ran into any weird syntax-ish things, like unicode literals being removed.

Heroku and Travis both worked with Python 3 flawlessly. 👌

There was no downtime in the transition, though some weird bugs persisted for almost an hour and made a couple users sad (I reached out and apologized). Considering the magnitude of this upgrade, I think that is quite good.

I get to use all the fun Python 3 standard library stuff now! I feel like a front-end engineer whose company just dropped support for IE9.

I won’t give a very scientific set of measurements, but according to Heroku’s metrics the maximum memory usage decreased by ~1.5MB, median average response time decreased by ~30ms (10%), and 95th percentile average response time decreased by ~300ms (15%). That’s pretty damn good!

So what could have gone better?

Much of the time I spent working with the various tools involved fiddling with absolute imports and un-doing Modernize’s changes with the built-in functions and built-in type methods: out of the two dozen or so changes, I ended up accepting two. I get that this is a hard problem, but the UX around this (for the developer) could be much improved. I ended up using git add -p and doing a git checkout -- . after I’d committed the good stuff. Something that wraps that flow would have been great.

and doing a after I’d committed the good stuff. Something that wraps that flow would have been great. 2to3 , given that it’s the community’s official answer to Python 3 compatibility, is an unmitigated disaster. Other tools like Modernize and PyLint did a far better job (on my codebase) and produced substantially less noise. They also allowed me to target both Python 2 as well as Python 3. 2to3 could not have been more unhelpful.

, given that it’s the community’s official answer to Python 3 compatibility, is an unmitigated disaster. Other tools like Modernize and PyLint did a far better job (on my codebase) and produced substantially less noise. They also allowed me to target both Python 2 as well as Python 3. could not have been more unhelpful. Sneaky runtime problems could have been avoided. I would have appreciated any of the tools I used giving me a warning with something like “Hey, you’re using b64encode , you probably will have issues in Python 3.” If they exist, they need to be talked about more.

, you probably will have issues in Python 3.” If they exist, they need to be talked about more. My Heroku slug size increased by almost 20MB by switching to Python 3. Deploy times increased by about 10%. I’m looking for ways to make that better, but it’s not promising. Not a big problem, but I’m not thrilled about it.

I’ve wanted to play with PyPy for a while now, but now that I’m on Python 3, the chances that anything will work at all have dropped substantially. Pinecast does a few things that are CPU-bound, and PyPy would have been a cool way to make things better.

Overall, I feel like I’ve learned a few things, and I’m very satisfied that my code is running well on Python 3.