Go vs CPython: Visual comparison of concurrency and parallelism options

Using MPG diagrams to see the differences between Threading, Multiprocessing and Asyncio, the 3 official CPython options, and Go Runtime.

This article will assume that your tasks are CPU-intensive. For IO-intensive tasks every option will perform comparable good, with large deviation on RAM usage and minor deviation on CPU usage and performance, given a reasonable concurrency level.

Update: This article turned into a Python Brasil 2017 talk now on Youtube (audio in Portuguese). Slides available on slides.io.

Update2: Is released the grumpy-runtime v0.3.0, now pip-installable. Details on https://labs.getninjas.com.br/released-grumpy-runtime-v0-3-0-a05f1cf8e111

After reading about the way Golang Runtime schedules OS threads and Green threads "Goroutines" over the hardware CPUs, I found myself comparing it with the Python current concurrency options Threading, Multiprocessing, Asyncio and Green Threads (greenlet & gevent and others).

The MPG diagrams used in Mr. Daniel Morsing' article seems a good tool to visually model the concepts around Machine threads, Processing contexts and Green threads (called Goroutines in the original, but lets [ab]use the concept…)

MPGwhat??

Machines are the OS Threads the runtime spawns/forks to run its work. No news here: OS schedules it and switches over at own will. Even in the middle of some operation.

Processing contexts (originally called Processors) are the info that Green threads "G" needs loaded before run. Resembles an user-space CPU.

Green threads (or Goroutines in Golang world) are user-space threads, scheduled by the application. Usually, there is an infinite loop controlled by the language runtime.

Keep it simple.

CPython code have a very strong stance in favor of simplicity. There is no Processing contexts and was no native Green threads until Python 3. It is made this way to be simple to maintain on the long run.

By default, CPython is as simple as an OS thread :)

Is very efficient for single-threaded apps and very simple to explain and maintain, mostly because of the infamous Global Interpreter Lock (GIL). If you need some concurrency the standard library offers 3 official options: threading, multiprocessing and asyncio/coroutines. Asyncio is the native Green threads of Python 3, but you can get similar results via Gevent and others on Python 2.

The threading module allows to spawn OS Threads (M on MPG), that are cheap on CPU and RAM. But the GIL locks to only one thread to use any CPU at a time. Plus, you have the burden of putting locks over your critical code yourself because the OS can switch the running thread and mess non-atomic operations. And you got concurrency but no parallelism.

CPython Threading: only one OS Thread can run at a time. The one that got the GIL.

To solve the "no parallelism" emerged the multiprocessing module, forking a new OS Process (M of MPG). This can truly run in parallel on computers with multiple CPUs, but the objects on both forked OS Process (M) cannot see each other and needs to communicate by other ways. Only OS ways. Personally, I see it as an elegant hack that works for most cases, but cannot share unpicklable stuff like generators, for example. And is very costly on RAM, delay time to spawn and on CPU for context switching, compared to the threading option.

Multiprocessing: One GIL per CPython process. But different processes cannot share unpicklable stuff eg. generators

AsyncIO and Gevent-ish options creates Green threads (G of MPG) into the only running OS Thread (M). It is the cheapest concurrency option over RAM, CPU and code burden. No OS preemption allows user code with almost no explicit locks, as the concurrency is cooperative. Application controls the green thread switching. However, only one Green thread can get a CPU at time, like OS Threading. Again, this is CPU concurrency, not CPU parallelism.

AsyncIO & Gevent: The main CPython thread schedules the internal Green thread to be ran. One at a time.

Introducing the P of MPG

Processing context (P of MPG) is, in my humble opinion, THE largest difference of Go Runtime over the CPython options. Green threads (G) can move between Processing contexts (P) to rebalance the workload.

See: https://morsmachine.dk/go-scheduler. On Go Runtime, Green threads can move between Processing context queues to run early and in parallel.

Processing contexts (P) are movable around OS Threads (M), if the actual thread needs to stop by any reason. And, by having no GIL, a number of Processing contexts on the same number of hardware CPUs can have each one a running Green thread (G). This is true parallelism made cheap!

Go Runtime: Processing context (P) can switch to another OS thread (M) if the actual one is stopped, eg. to do a syscall.

Here be dragons

How could Go Runtime run stuff in parallel having no GIL and still be fast? Many tried to have a GIL-less Python, many failed. Indeed some succeed but end up with a very slow result.

Go Runtime have some parts very different than CPython internally. The garbage collector is a lot more complex than CPython one. Is designed to run in parallel with the user code. And there is no internal interpreter.

Do this all in CPython would be very difficult to be achieved without breaking the internal C-APIs, and would be very hard to be accepted on the upstream official.

Is there Python hope?

What about reimplement just the Python language over the Go Runtime? Sounds weird, but the Go complex concurrent and parallel GIL-less thing is done already and being maintained & improved, right?

Oh, Batman, there is the Grumpy runtime born by Youtube dudes! They even published a benchmark!! And suddenly this whole blogpost seems a disguised propaganda to get help on the opensource project https://github.com/grumpyhome/grumpy/. Is too late, Robin, we had read the whole thing already!

:trollface:

Please help us. All of us. Unless there is a better way.