Photo by Kento Iemoto on Unsplash

At Tripping.com, whenever a user makes a search, one of our Elixir microservice spawns up a bunch of GenServers to talk to a few external systems. The data processed through these is sent over websockets to the user’s browser. This part of the code is a bit CPU/memory/network intensive because it needs to keep network connections open, do a lot of JSON serialization/deserialization and store the intermediate data in memory to keep things fast.

It has been working pretty great. However, keeping all this data in memory and having network connections open for idle users is not a good use of our resources. So, to make our app leaner, we wanted to discard all the searches where users are idle for more than 10 minutes.

Timeout a fixed number of seconds after init

Getting a timeout which fires in 10 minutes is pretty easy in Elixir. All you do is a send a message to yourself using `Process.send_after`.

This will just start a timer when the `GenServer` is initialized and timeout after a fixed number of seconds. However, we wanted a sliding timeout which pushes the timeout further each time the client had some activity.

Use Process.cancel_timer

This led us to the next possible solution using `Process.cancel_timer`. We changed our code to handle a `:touch` message on the `GenServer` which cancelled the previous timer and created a new one instead.

However, what happens when the timer triggers just before you call `Process.cancel_timer`? The `:idle_timeout` still gets sent and ends up (in this instance) purging all our search data. We went down this rabbit hole a bit further to remedy the situation but ended up with a lot of complex code which we decided to scrap in favor of a simpler solution.

Use a new ref for each new idle timeout

We found a simpler solution by making a new ref (using `make_ref`) and sending it as part of the `:idle_timeout` message. Whenever we wanted `touch` we just created a new ref and sent it over. The latest ref was also stored into the `GenServer` state so that it could be compared when an `:idle_timeout` came in, to see if it was the latest ref or a ref from an old timer. This turned out to be much simple and without any race conditions.

Thanks to Richard Duarte for helping me with this blog post.