Key Findings

Disclosure

After trying out a couple of executions it was pretty clear that the stack was getting easily under stress as the average response times started to increase to non-friendly values on account of the database write/read response times.

As such I accepted it for what it was and considered this a load test.

Overall Results

After all test executions, clean ups and warm ups, here are the results.

Overall results table

Total requests

There’s a real big difference between Go and all other. At the end of the 5 minutes test, Go was able to process 1.74 times more events than Elixir and roughly 2 times more than Python.

Average Response Time

Average response times also translate the exact same findings as the total number of requests, no surprise here. Go's average response of 559ms, Elixir still below 1s with 974ms and Python around 1.144~1.193 ms.

Throughput

The average throughput can be translated into:

Go: ~10k RPM Elixir: ~6k RPM Python: ~5k RPM

Given my previous experience with Elixir I expected it to show significantly better single-process performance. Comparing Elixir to Python in this context reveals a performance difference of 18%. IMHO, not enough to justify a stack swap.

Japronto vs Flask didn't bring anything relevant to the table in this scenario. With only a 3.8% of performance improvement Japronto was not a game changer for Python.

Go actually delivered as expected. The numbers say it all, 559m under stress of 10k RPM are a definitely a good result not only by comparison but in absolute.

Individual Performance Signatures

Python — flask

Flask's response time signature is quite linear. There's a response time bump around minute 1 but after repeating the tests a couple of times it shows no impact in the final outcome.

Python — japronto

Japronto's signature was a bit erratic. It varied quite a lot but all tests showed the same pattern.

Elixir — plug

Elixir's signature constantly showed a wave-like pattern in all tests. This looks like an underlying behaviour which I didn't dig into but that I assume we can attribute to Beam (Erlang's VM).

Go — gin

Go's signature shows a linear increase but, from all frameworks, Go revealed the highest Standard Deviation in response times.

InfluxDB

From the ground up InfluxDB was a very easy to install and configure database. I tried in MacOS and Ubuntu 16.04 and it played nice and easy in both.

It has a SQL-like query syntax, which allows most people to get around it faster. It also introduces the notion of Measurements (roughly similar concept to a Collection or Table), Tags which are automatically indexed by InfluxDB and allow you to query by it and also Fields (which should be our data value entries).

It also provides a nice Admin webpage which lets you query and visualize data. You may even check DB stats and Diagnostics as it provides a couple of pre-defined query examples.

However I was really sad to find this in their docs:

Open-source InfluxDB does not support clustering. For high availability or horizontal scaling of InfluxDB, please investigate our commercial clustered offering, InfluxEnterprise.

To be able to use a DB cluster, is something quite important for us at this stage. Though it can be quite a while until you need an InfluxDB Cluster, it's not nice to know you can hit a hard wall, a paying wall.

Not sure I'd use InfluxDB for a highly scalable scenario as I would not be ok with having to pay for a clustering solution where I can deploy solutions like Cassandra, Hadoop or Kafka (not quite the same but still relevant).

Out with the DB

Our main purpose was to find out how a full stack would behave in the same conditions and we achieved that. However, it’s noticeable that the response times increase continuously and in a linear manner (kind of).

Before I concluded this experiment I wanted to be able to compare how the language stacks would fare against each other if we took out the biggest time consuming variable from the equation, the database.

For this purpose I ran the exact same tests but now there's no write or read, we parse the request params and return a fixed number. A dummy test.

The results were a bit different than expected:

In this scenario, Elixir was the fastest. Japronto responded faster than Go and Flask is the slowest of all.

Let's look at the RPM values this time around:

Go: ~80k RPM Elixir: ~113k RPM Python-Japronto: ~95k RPM Python-Flask: ~25k RPM

This definitely gives us some perspective on http server response times vs full stack (server + database wrapper) response times. Interestingly enough we see a big delta between Flask and Japronto — the same was not true when we added InfluxDB to the mix.

Conclusion

In most scenarios, you can achieve the exact same end result with Python as you do with Go. With Python, you’ll probably be able to ship far faster, and have more developers available.

It’s also far more economical when it comes to the actual amount of code needing to be written; it took me around 100 lines of code in Go to do the same as with 20 lines with Elixir and 35 with Python.

That said, you shouldn’t really compare lines of code as a metric without strong context, because for scaling-sensitive scenarios, more lines of code is not relevant. In our testing, Go set the bar for performance very high, and while you may need to take the time to write better tests and keep an eye on code styling and documentation, it feels like it is worth the effort.

So, given these results, are we actually going to replace any of Unbabel’s core services with one of these stacks? Our current thinking is that Go might be best for processing, Elixir for I/O intensive services and Python for more mainstream scenarios. But yes. The answer is yes.

How we go about it is a topic for another post. ;)