Every Python Programmer Should Know the Not-So-Secret ThreadPool

You are just a few lines of code away from speeding up your code by orders of magnitude with multithreading

Image by RÜŞTÜ BOZKUŞ from Pixabay

I first came across the necessity for parallelizing my code with Python when I had to run hundreds of external update operations on our CRM system without the option of batching them.

Each update operation would be submitted via an API call and then take about two to three seconds to process. Those updates would trigger processes in the CRM and sometimes throw errors.

The possibility of errors meant that I had to go through the motions countless times to make sure that everything finished to my satisfaction.

What made this endeavor take so excruciatingly long was the fact that after every single API call, my script would have to wait for a response before submitting the next API request.

A situation like this is a typical use case where multithreading (one concept of parallelism usable in Python) comes in very handy! In Python, there are, in essence, three forms of concurrency:

Multithreading — pre-emptive, via threading .

. Cooperative multitasking — via asyncio .

. Multiprocessing — via multiprocessing .

The general advice is to use multiprocessing for CPU-bound problems (i.e., computationally intensive) and multithreading/multitasking for I/O-bound problems (i.e., waiting for input/output to finish).

Of course, there might be exceptions and ultimately, it comes down to the individual case at hand. In my experience, it does make sense to look into all options as soon as performance becomes critical.

I set up a web API (AWS API gateway + Lambda) that spits out motivational quotes, which we can “DoS” for benchmarking purposes.

Here’s a sample!