Today I want to talk about GenStage. Although this behaviour was introduced a while ago in Elixir ecosystem not every developer has tried it out. And there is a bunch of questions that you’ll probably face while dealing with GenStage for the first time.

Stages are data-exchange steps that send and/or receive data from other stages. When a stage sends data, it acts as a producer. When it receives data, it acts as a consumer.

I don’t want to duplicate all the information available in the GenStage’s documentation. I’d rather to make an emphasis on some key aspects I’ve faced on my own during GenStage implementation and usage in production.

Two usage approaches

There are various use cases for GenStage but in general I split them (my humble opinion) on two groups depending on the ownership of events (data) source, and they are simple:

You own events source (demand handling)

In this case it is your responsibility to fetch more events, you can control the load and the demand received from Consumers. GenStage’s buffer is needed for some corner cases (for example, when your Consumer(-s) went down in the moment of events dispatching).

An events source might be a database or third-party application, API, etc. Producer is going to fetch events only in order to cover an incoming demand from its consumers.

Example: you need to import data from another application and this application exposes an API which allows you to fetch data by batches or pages.

Once your Consumers handled a batch they start asking for more and the producers makes a call to that external API in order to fetch exact amount of data and emit events, satisfy the Consumers demand.

If your GenStages where designed properly there is no chance for a Producer to be overloaded by data.

In this case GenStages are implemented for parallel or even distributed data handling.

You don’t own events source (events pushing)

In this case you can not control events coming into a Producer, therefore you can not control the load. Producer’s buffer starts to play an important role: as built-in backpressure mechanism.

Producer is going to receive various and unpredictable amount of events from time to time or serve infinite data stream.

Example: you need to send push notifications depending on some events from your application (received from RabbitMQ for example). You can not predict the amount of push notifications needed to be sent. At some moment you should to handle only 10 requests for pushes but in a next few minutes this amount can turn into 10000 (marketing team did their job great and users started to use your application like crazy).

In this case Consumers wait for work and Producer itself ignores Consumers demand, receives incoming external events and dispatches them to the Consumers while trying not to die under the pressure.

So GenStages should be implemented for backpreassure purpose mainly.