As developers, many of us build breakable toys in various programming languages when learning something new; the goal is to learn the language and not the intricacies of a new product, after all! Some of us reach for todo apps, others a messaging clone. If you’re like me, though, it’s URL shorteners.

I built my most recent URL shortener in Elixir, and decided to see how much raw speed I could get out of the application by using GenServer - Generic Server - to build out a key-value store to bypass the database when possible.

A GenServer is one mechanism for leveraging state in Elixir, so it’s the perfect use for a transient, in-memory cache. In this example, I’ll be caching the Struct straight from Ecto and using the slug from the URL to determine the cache key.

Now that we understand the underlying mechanism we’ll be using, the cache behavior is the next step:

# lib/link_cache/cache.ex defmodule LinkCache . Cache do def fetch ( slug , default_value_function ) do case get ( slug ) do { :not_found } -> set ( slug , default_value_function . ()) { :found , result } -> result end end end

We’ll use LinkCache.Cache.fetch/2 to provide both a slug for lookup and a function to execute to populate the cache if no value is found. While LinkCache.Cache.get/1 and LinkCache.Cache.set/2 don’t exist yet, we can infer behavior: LinkCache.Cache.get/1 will either return a tuple of {:not_found} (at which point we’ll call LinkCache.Cache.set/2 to assign the value and return it), or {:found, result} , at which point we return the result directly.

To leverage this, let’s change RedirectTo.RedirectController :

--- a/web/controllers/redirect_controller.ex +++ b/web/controllers/redirect_controller.ex @@ -3,7 +3,9 @@ defmodule RedirectTo.RedirectController do alias RedirectTo.Queries def show(conn, %{"slug" => slug}) do - link = Queries.Link.by_slug(slug) |> Repo.one + link = LinkCache.Cache.fetch(slug, fn -> + Queries.Link.by_slug(slug) |> Repo.one + end) redirect conn, external: link.long_url end

With our usage out of the way, let’s build out the rest of LinkCache.Cache .

# lib/link_cache/cache.ex defmodule LinkCache . Cache do use GenServer def start_link ( opts \\ []) do GenServer . start_link ( __MODULE__ , [ { :ets_table_name , :link_cache_table }, { :log_limit , 1_000_000 } ], opts ) end def fetch ( slug , default_value_function ) do case get ( slug ) do { :not_found } -> set ( slug , default_value_function . ()) { :found , result } -> result end end defp get ( slug ) do case GenServer . call ( __MODULE__ , { :get , slug }) do [] -> { :not_found } [{ _slug , result }] -> { :found , result } end end defp set ( slug , value ) do GenServer . call ( __MODULE__ , { :set , slug , value }) end # GenServer callbacks def handle_call ({ :get , slug }, _from , state ) do %{ ets_table_name: ets_table_name } = state result = :ets . lookup ( ets_table_name , slug ) { :reply , result , state } end def handle_call ({ :set , slug , value }, _from , state ) do %{ ets_table_name: ets_table_name } = state true = :ets . insert ( ets_table_name , { slug , value }) { :reply , value , state } end def init ( args ) do [{ :ets_table_name , ets_table_name }, { :log_limit , log_limit }] = args :ets . new ( ets_table_name , [ :named_table , :set , :private ]) { :ok , %{ log_limit: log_limit , ets_table_name: ets_table_name }} end end

We’ve already seen LinkCache.Cache.fetch/2 , so let’s move on to LinkCache.Cache.get/1 and LinkCache.Cache.set/2 .

defp get ( slug ) do case GenServer . call ( __MODULE__ , { :get , slug }) do [] -> { :not_found } [{ _slug , result }] -> { :found , result } end end

GenServer.call/3 is a synchronous function we use to send a request to the server ( __MODULE__ , which is equivalent to LinkCache.Cache ). The 2-tuple {:get, slug} (which we’ll pattern-match on) tells the server to retrieve the value with the key slug , and will return zero or one results due to the underlying storage mechanism we’re using.

defp set ( slug , value ) do GenServer . call ( __MODULE__ , { :set , slug , value }) end

Here, we’re performing another synchronous action to cache the value for the specific slug , returning the 2-tuple {slug, value} .

Both LinkCache.Cache.get/1 and LinkCache.Cache.set/2 are sending requests to __MODULE__ , as discussed before; we need to hook into these behaviors, pattern-matching on the requests we’re sending, to trigger underlying behavior.

Above, we’d seen GenServer.call/3 used, with the appropriate tuples being passed; let’s look at the hooks we’ve written to handle these requests:

def handle_call ({ :get , slug }, _from , state ) do %{ ets_table_name: ets_table_name } = state result = :ets . lookup ( ets_table_name , slug ) { :reply , result , state } end def handle_call ({ :set , slug , value }, _from , state ) do %{ ets_table_name: ets_table_name } = state true = :ets . insert ( ets_table_name , { slug , value }) { :reply , value , state } end

GenServer.handle_call/3 manages the request (either {:get, slug} or {:set, slug, value} ), a 2-tuple of the request sender (PID and descriptor), which we’re ignoring here ( _from ), and the server’s internal state ( state ). Both reply to the request with a 3-tuple {:reply, link_struct, state} - with the link Struct available, we can then retrieve the URL and send the user on their way.

LinkCache.Cache.get/1 and LinkCache.Cache.set/2 each reference :ets , an Erlang module referencing the underlying ETS (Erlang Term Storage) architecture, an incredibly performant in-memory store with O(1) lookups for “set” storage, which we’re using.

In both definitions, we pattern-match on state to retrieve the table name for our link cache and perform both insert (for LinkCache.Cache.set/2 ) and lookup (for LinkCache.Cache.get/1 ) operations.

We configure :ets in LinkCache.Cache.start_link/1 and LinkCache.Cache.init/1 :

def start_link ( opts \\ []) do GenServer . start_link ( __MODULE__ , [ { :ets_table_name , :link_cache_table }, { :log_limit , 1_000_000 } ], opts ) end def init ( args ) do [{ :ets_table_name , ets_table_name }, { :log_limit , log_limit }] = args :ets . new ( ets_table_name , [ :named_table , :set , :private ]) { :ok , %{ log_limit: log_limit , ets_table_name: ets_table_name }} end

These functions prepare LinkCache.Cache to be supervised ( GenServer.start_link/3 ) with configuration (a table named :link_cache_table with room for 1 million 2-tuples), as well as create the table ( :ets.new/2 ), configured for a named table with “set” storage.

With the underlying cache built, we now want to ensure Phoenix keeps the cache running. Elixir promotes a strategy of letting things crash - and ensuring recovery is quick and straightforward - with Supervisors at the helm.

Let’s build out a supervisor to ensure LinkCache.Cache is monitored and runs correctly:

# lib/link_cache/supervisor.ex defmodule LinkCache . Supervisor do use Supervisor def start_link do Supervisor . start_link ( __MODULE__ , :ok , name: __MODULE__ ) end def init ( :ok ) do children = [ worker ( LinkCache . Cache , [[ name: LinkCache . Cache ]]) ] supervise ( children , strategy: :one_for_one ) end end

Our LinkCache.Supervisor supervises only LinkCache.Cache , with a :one_for_one strategy.

We’ll also need to update our application to include this supervisor:

# lib/redirect_to.ex defmodule RedirectTo do use Application def start ( _type , _args ) do import Supervisor . Spec , warn: false children = [ supervisor ( RedirectTo . Endpoint , []), supervisor ( RedirectTo . Repo , []), supervisor ( LinkCache . Supervisor , []), ] opts = [ strategy: :one_for_one , name: RedirectTo . Supervisor ] Supervisor . start_link ( children , opts ) end # ... end

So, in addition to our RedirectTo.Endpoint and RedirectTo.Repo , our LinkCache.Supervisor will also be managed by RedirectTo ; this ensures if anything happens to the cache, our application will restart it.

You can find a more thorough walkthrough of supervisors, supervision trees, and how to tie various components together on Elixir’s discussion of OTP topics.

To test the cache, I wrote a small test ensuring the cache set the value based on return value of a function, and reused that value on a subsequent lookup.

# test/lib/link_cache/cache_test.exs defmodule LinkCacheTest do use ExUnit . Case test "caches and finds the correct data" do assert LinkCache . Cache . fetch ( "A" , fn -> %{ id: 1 , long_url: "http://www.example.com" } end ) == %{ id: 1 , long_url: "http://www.example.com" } assert LinkCache . Cache . fetch ( "A" , fn -> "" end ) == %{ id: 1 , long_url: "http://www.example.com" } end end

Again, the cache can store Structs, Maps, or all sorts of data structures. As is the case with any cache, invalidation might be tricky. The architecture of this application is such that the links themselves cannot be updated, so I’m not concerned about invalid data. Your mileage may vary.

I’d admittedly chosen a log_limit of 1 million pairs without knowing the memory implications of that particular cache size. And while lookups are O(1) (constant time, regardless of the number of pairs in the cache), I did want to verify I wouldn’t be running out of memory.

How do we verify this?

First, fire up IEx running the Phoenix application:

iex -S mix phoenix.server

Next, run :observer.start() within IEx to see a GUI outlining various metrics of the running application.

Assuming ETS doesn’t make any sort of optimizations around similar values:

link = RedirectTo . Queries . Link . by_slug ( "A" ) |> RedirectTo . Repo . one 1 .. 1_000_000 |> Enum . each ( fn ( i ) -> LinkCache . Cache . fetch (( i |> to_string ), fn -> link end ) end )

This inserts one million pairs into the cache, with the keys being string versions of each integer ( "1" , "1250" , etc.) and the values being the Struct directly from Ecto.

Over the course of ~40s, all 1 million pairs were stored in-memory (insertion of ~25,000/s) with a result size of around 856MB, or just under 1kB per pair.

Storing a subset of the data results in significantly less memory usage:

link = RedirectTo . Queries . Link . by_slug ( "A" ) |> RedirectTo . Repo . one |> Map . take ([ :slug , :long_url ])

With the cache in place, I deployed the cache- and database-read versions of the application to Heroku, running on the free server and database.

In both tests, I used Siege against the HTTP version of the application, configured not to follow redirects, against the same slug with 500 concurrent users over 60s. In both cases, the Heroku dyno was running (no spin-up time).

Transactions: 29508 hits Availability: 100.00 % Elapsed time: 59.75 secs Data transferred: 2.67 MB Response time: 0.70 secs Transaction rate: 493.86 trans/sec Throughput: 0.04 MB/sec Concurrency: 343.44 Successful transactions: 29508 Failed transactions: 0 Longest transaction: 5.98 Shortest transaction: 0.04

I was impressed by these numbers; almost 30,000 requests per minute on free hardware, with an average response time of roughly 700ms, and this was hitting the database on each request. No connections were dropped.

Transactions: 44829 hits Availability: 100.00 % Elapsed time: 59.38 secs Data transferred: 4.06 MB Response time: 0.38 secs Transaction rate: 754.95 trans/sec Throughput: 0.07 MB/sec Concurrency: 284.19 Successful transactions: 44829 Failed transactions: 0 Longest transaction: 8.39 Shortest transaction: 0.03

The throughput when reading from the cache was significantly better, serving 50% more requests. Additionally, the average response time was about half that of the run hitting the database.

Even with Elixir’s incredible speed and concurrency support out of the box (30k RPM on free Heroku hardware with sub-second response times is pretty great), there are always ways to leverage certain technologies for what they’re great at. By leveraging Erlang’s ETS, GenServer, and supervisors, we were able to build a fault-tolerant, scalable in-memory cache to improve both performance and throughput.