Background

In February 2017 I was in Cape Town on vacation and got bored at some point (to be honest, my credit card was skimmed in a sketchy bar on Long Street — don’t go there — which limited my budget, disrupted my travel plans and tied me to the city). So I thought it would be nice to build a little weekend project with Elixir’s GenStage since it was recently introduced and I didn’t get around building something with it.

I chose Twitter as a data source because their Stream API provides a constant stream of data that could possibly overload a system — exactly one of the things GenStage promised to handle well. I chose Angular because I like working with it and I didn’t find that many articles on how to wire Phoenix Channels together with an Angular front end.

The basic idea was to filter a stream of tweets and display the result on the front end side as fast as possible with little resources and without ever crashing the system. I generally like things that are palpable and fun but have sophisticated technology under the hood. So adding emojis and a map to the mix made a lot of sense.

If you aren’t fond of reading the whole article: I published the whole project on GitHub. You are free to use and alter it. And if you find anything that you know better, please submit a PR!

What does this app do?

Basically, it shows all tweets that have geolocation attached to them and contains an emoji on a map in (almost) real-time. It shows the first emoji and when clicking on it you can read the whole tweet. It also shows a snapshot statistic of the worldwide emoji usage on Twitter since you visited the page.

It doesn’t accumulate anything and stores nothing in a database. It always only shows you what is happening right now. And it is really fast. Try yourself at http://emojimap.ospaarmann.com/. Open the map, zoom in at any location, take your phone and send a tweet, geotagged at this location and containing an emoji.

Back in the day

Let’s just remember how we would build something like this without Streams, OTP, and WebSockets. We would have one backend constantly polling Twitter’s REST API for new tweets (possibly hitting the rate limits). The tweets fulfilling our criteria would have to go into a database — maybe Redis — because the polling would be done by some background job that doesn’t directly reply to requests from the clients. The clients would constantly poll the backend (possibly overloading and crashing it) for new tweets. But it would be difficult to decide what data we should send back since every client started polling at a different time and had possibly already received a different set of tweets. We would need much more server power, we would have more moving parts, it would be quite difficult to design and we would have to wait at least a couple of seconds between sending a tweet and seeing it pop up on the map.

Why GenStage?

I highly recommend starting by reading Announcing GenStage and this great article by Mario Flach. Basically, GenStage is a behavior in OTP or more precise a variation on Elixir GenServer that is designed with back-pressure in mind so that the consumer of messages is not overwhelmed by the volume of the message. Or in other words: It is a demand driven system where the consumer of messages only receives as many messages as she can handle and only if she asks for it.

To start the flow of events, we subscribe consumers to producers. Once the communication channel between them is established, consumers will ask the producers for events. We typically say the consumer is sending demand upstream. Once demand arrives, the producer will emit items, never emitting more items than the consumer asked for. This provides a back-pressure mechanism.

GenStage documentation

This comes in very handy when we want to handle a stream of data where we don’t really know how many messages are going to hit us. Like, say, a stream of tweets with funny faces.

In GenStage we have producers (or broadcasters) who emit events, consumers who consume events and producer-consumers who do both and typically sit somewhere between a producer and a consumer. But I don’t want to get into too much detail here, I recommend reading the mentioned articles and the documentation. I will also put a list of references at the end of this article.

The moving parts — an overview

I’m not a graphic designer and I’m almost color blind, so shut up.

On the Elixir side, we have the TwitterStream that handles the connection to the Twitter Stream API (via the ExTwitter client library). It receives the tweets and only keeps the ones containing an emoji and geo-coordinates. I started off with a very crude regex but then discovered Exmoji, the self-proclaimed swiss-knife for handling emojis in Elixir and a great sign that the Elixir community is growing like crazy.

We then have the TweetBufferFiller which is a GenServer that encapsulates the TwitterStream, starts it and notifies the TweetBroadcaster about new tweets. This is pretty easy since the TwitterStream module receives an Elixir Stream from ExTwitter, applies filters and some normalization and returns a Stream again. So I can use Stream.map/2 to call TweetBroadcaster.sync_notify/1 for every tweet in the stream.

The TweetBroadcaster and the TweetConsumer are the producer and consumer of our GenStage. They make sure that the system doesn’t overload. Whenever a new tweet arrives, TweetBroadcaster is notified. It buffers the tweets until TweetConsumer asks for more. This happens whenever the TweetConsumer is done pushing them down to the front end. The TweetBroadcaster also stores demand if there should at some point not enough tweets be available. The TweetConsumer pushes them to the clients via WebSocket or in our case the Phoenix implementation of WebSocket: Phoenix Channels.

The front end is written in Angular 2. That is outdated now but well, that doesn’t really matter for this purpose. It connects on startup with the Phoenix Channel and then places a marker with an emoji as the symbol and the tweet as the content of a popup on a styled MapBox map. It always only shows the latest 700 tweets to not overload the browser (it is already a bit laggy like this). There is also a little statistic where the use of every emoji is counted and displayed (the sort order changes only every 5 seconds to save CPU).

That’s it. Not that complicated, huh?

The moving parts — in detail

In this part of the article I will discuss the most important parts of the code in greater detail. If you prefer to read the code yourself, head over to my GitHub repo and skip this part. But it might be helpful and interesting. Also: If I did something silly please tell me in the comments or on GitHub!

EmojiMap.TwitterStream — lib/emoji_map/twitter_stream.ex

The main function here is get_emoji_stream/0 . It starts the Twitter Stream and does normalisation and filtering. The first little trick here is that I only want tweets with a geolocation. I cannot pass this as an option to Twitter and I don’t want to do the filtering on my side. So I use ExTwitter.stream_filter/1 and pass a bounding box that covers the whole planet as an option. This way I only get geotagged tweets. But some have a location, some an actual lat/long geolocation. The rest of the code takes care of that.

I then pass the stream, which is an Elixir stream, into a pipeline of functions to filter and normalise the tweets further. I can handily use Stream.filter/2 and Stream.map/2 for that. The main function get_emoji_stream/0 finally returns a stream of filtered and normalised tweets.

EmojiMap.TweetBufferFiller — lib/emoji_map/tweet_buffer_filler.ex

This module ties everything together. It starts EmojiMap.TweetBroadcaster and EmojiMap.TweetConsumer the producer and consumer. This pair is responsible for the stability of the system via GenStage and back-pressure (remember?).

It also calls EmojiMap.TwitterStream.get_emoji_stream/0 , receives an Elixir stream with filtered and normalised tweets, calls Stream.map/2 on it and notifies the TweetBroadcaster about every incoming tweet.

It is also the module that is started in the supervisor tree when the application boots. So as soon as the application starts, everything is up and running. See lib/emoji_map.ex for that.

EmojiMap.TweetBroadcaster — lib/emoji_map/tweet_broadcaster.ex

This follows the standard practice pretty closely. Nothing fancy here.

EmojiMap.TweetConsumer — lib/emoji_map/tweet_consumer.ex

The consumer finally receives the filtered, normalised tweets from the broadcaster and sends them to the frontend via WebSocket or Phoenix Channel. The only important thing here is: Don’t confuse EmojiMap.Endpoint.broadcast with a part of the GenStage system. It is just Phoenix.Endpoint.broadcast/3 and sends a message to all subscribers of a channel.

That’s it & Thank you

If you have any questions, feedback, critique: Please just comment or send me a message. I hope this article was helpful in understanding GenStage. And maybe this is a fun idea and some of you want to do something with the pretty imperfect code I threw out there.

If you want to learn more about our work at Tech Team Berlin, check out our website.

I would like to give a special thanks to my friends over at DigitalOcean, especially Hollie Haggans, where I host my demo. Super simple setup. Everything runs inside docker containers on a single $20 droplet. Pretty neat. If you want to check it out, consider using my referral link to get $10 free credit.

❤️ Thanks.