In this article, I will focus on creating a data ingestion and processing pipeline for Elixir using Broadway [1] library. As per Broadway’s github page, this library is used to —

“Build concurrent and multi-stage data ingestion and data processing pipelines with Elixir. It allows developers to consume data efficiently from different sources, known as producers, such as Amazon SQS, Apache Kafka, Google Cloud PubSub, RabbitMQ, and others.”

In this article, I will use Broadway producer for RabbitMQ [2], to demonstrate it’s basic usage developing a simple stock quote retrieval application.

P.S.: The example in this article is a concocted one and does not fully take advantage of all features of the library. Specially, due to the external API used to retrieve stock quote which allows only 5 calls per minute using free tier, concurrency support of the library is not utilized. But for real applications, concurrent data pipeline will increase processing power a lot and probably what most applications need. However, due to limitation of external API, I will rather show how we can utilize the rate limiting feature available as part of this library.

Major features for Broadway:

Back-pressure

Batching

Built-in testing

Ordering and partitioning

Rate-limiting

Metrics

You can read more about these features on the official documentation here.

Our Application:

I will develop a stock quote retrieval application and will use RabbitMQ as the message broker between producer of stock symbols and consumer that consumes those symbols from RabbitMQ and retrieve their quotes. For a simple application like this, RabbitMQ may be overkill. But if producer and consumer reside in different applications, we will need a message queue to work as a broker in between.

Pre-requisite: Setup RabbitMQ

I installed RabbitMQ on Mac using homebrew using below commands —

$ brew update $ brew install rabbitmq

After starting rabbitmq service, I used the below command to create a queue that will be used in the sample application —

$ rabbitmqadmin declare queue name=stock_queue durable=true

The durable options tells to persist the queue even if the server restarts.

Once the queue is created, it can be listed as below —

$ rabbitmqctl list_queues

Timeout: 60.0 seconds …

Listing queues for vhost / …

name messages

stock_queue 0

As we can see, stock_queue is created and currently there are zero messages in it.

Setup Elixir Project

At this stage, I created the Elixir application using the below command —

$ mix new broadway_stock — sup

I added necessary dependencies in mix.exs file —

defp deps do

[

# {:dep_from_hexpm, "~> 0.3.0"},

# {:dep_from_git, git: "https://github.com/elixir-lang/my_dep.git", tag: "0.1.0"}

{:httpoison, "~> 0.10.0"},

{:hackney, github: "benoitc/hackney", override: true},

{:json, "~> 1.0.0"},

{:broadway_rabbitmq, "~> 0.6.0"}

]

end

After adding dependencies, I installed them using the below command —

$ mix deps.get

I created a subdirectory under the root project directory called data and copied the CSV file that holds information about various stock symbols.

Producer Code

Our producer side code will parse the CSV file and generate stock symbols and write to stock_queue in RabbitMQ. The code goes below —

Here —

line 8–10: opens the channel to stock_queue

line 12–20: parses the CSV file and writes the stock symbols to stock_queue calling function write_queue .

calling function . line 25–27: definition of write_queue function that publishes the stock symbols to stock_queue

function that publishes the stock symbols to line 22: closes connection to queue

Consumer Code:

Next, is to define our consumer code in this producer consumer relation. And here comes broadway in play. Code for this goes below —

Here, I used the behaviour Broadway and defines it’s required functions and callbacks.

line 10–28: defines the start_link function that configures our pipeline. Some important things to note here — 1 )I am using the same queue — stock_queue, 2 ) I defined a transformer to perform some data transformation before processing (more on this later), 3 ) since I will be calling external API to fetch stock quote and that free tier service only allows 5 calls/min, I used rate-limiting to allow only 1 message every 12 seconds (line 17–20) and 4 ) since this app is I/O bound, we can’t have too much concurrency. So I am using only 1 processor (line 22–26)

)I am using the same queue — ) I defined a transformer to perform some data transformation before processing (more on this later), ) since I will be calling external API to fetch stock quote and that free tier service only allows 5 calls/min, I used to allow only 1 message every 12 seconds (line 17–20) and ) since this app is I/O bound, we can’t have too much concurrency. So I am using only 1 processor (line 22–26) line 42–47: defines our transformer function transform/2. Typically, this function should be used to perform any transformation needed before being processed by handle_message callback. In this function, I just processed the stock symbols coming out of the queue and removed the double-quotes surrounding them.

Typically, this function should be used to perform any transformation needed before being processed by callback. In this function, I just processed the stock symbols coming out of the queue and removed the double-quotes surrounding them. line 31–40: defines the handle_message function that take stock symbol one at a time and calls get_quote function to retrieve quote and display using function display_quote.

Retrieve Stock Quotes

The api_worker.ex file contains all the code necessary to fetch a stock quote given a stock symbol. It’s pretty straight forward implementation of Elixir’s GenServer behaviour. Code for this goes below —

This code should be straight-forward to understand. One thing to note though I left the API key from https://www.alphavantage.co in this file. But for production ready code it should be stored in configuration file and should be injected using environment variables.

Supervision Tree:

Finally, we need to populate our supervision tree in application.ex file as below —

children = [

# Starts a worker by calling: BroadwayStock.Worker.start_link(arg)

{BroadwayStock.Worker, []},

{Api.Worker, []}

]

Our application supervisor will supervise both BroadwayStock.Worker and Api.Worker.

Output:

I ran the below commands to compile and load the application in iex —

$ mix compile $ iex -S mix

Followed by —

$ BraodwayStock.dispatch

This should produce one line of output every 12 sec and should look as below —

EOD — $3.91

WELL — $44.40

WCC — $23.54

WST — $158.14

WAL — $32.03

WALA — $23.00

Batch Processing

Broadway pipeline also allows processing in batches. I have not used in this example since there is not much scope of that. But you can read this in the official doc how to do that — https://hexdocs.pm/broadway/rabbitmq.html

Concluding remarks

Even though, I did not explore the full power of Broadway with this simple app, I think this will help some to start with using this library. The full code for the sample application is available on my github here — https://github.com/imeraj/Elixir_Playground/tree/master/broadway_stock

For more elaborate and in depth future technical posts please follow me here or on twitter.

References:

[1] Broadway — https://github.com/dashbitco/broadway

[2] A Broadway producer for RabbitMQ — https://github.com/dashbitco/broadway_rabbitmq

[3] BroadwayStock app’s source code — https://github.com/imeraj/Elixir_Playground/tree/master/broadway_stock