Assume It Worked and Fix Later

1,801 reads

How to Make Your App Faster and More Reliable

During account signup, a web server will make an HTTP request to send an email. Not only are synchronous requests slow, but if the remote host is unresponsive, the application can become unresponsive.

A simple way to improve performance is to use a library like async to concurrently make requests while doing other computations. However, if you need the result of an outgoing network request, you will still have reliability issues if the remote host goes down.

Often, when a handler is making outgoing requests, the response is not needed from the perspective of the client. Signup emails can meet this criteria, but time sensitive notifications are an even better example, since they are usually best effort service anyway.

If your email service is down, it can be beneficial to have the signup succeed regardless. By decoupling the success of an email request from the success of account signup, you can improve the reliability of your application. That is the “assume it worked” part of the title, you will still need to persist a record of which messages were sent and a periodic job to send them, which is the “fix it later” part. Depending your requirements, there might not be a “fix it” phase at all.

In an ideal world, you would have a durable queue service like Kafka, co-located with your server, with low or sub-millisecond latency. This magical Kafka is better and simpler solution then the ones I will present. However, you might not find yourself in such a blessed circumstance.

I’ll walk through an example of making emails non-blocking, using the Amazon Simple Email Service and the corresponding amazonka package, amazonka-ses .

Synchronous Baseline

The simplest method is to make a call to AWS SES inline to send an email.

post "/user" $ do

input <- Scotty.body

email <- maybe missingEmailError return

$ input ^? key "email" . _String

resp <- liftIO

$ runResourceT

$ runAWS env

$ AWS.send

$ makeEmail email

logFailedRequest resp

-- Imagine there is code here for

-- inserting a user into the database

json $ object ["id" .= email]

Attempt 1: Fork a Thread

An easy way to achieve non-blocking asynchronous behavior is to fork a thread every time one needs to send an email.

liftIO $ forkIO $ handle logExcept $ do

resp <- liftIO

$ runResourceT

$ runAWS env

$ AWS.send

$ makeEmail email

logFailedRequest resp

If AWS becomes slow, or is timing out, my threads will queue up. The threads will start to eat resources and, if things get bad enough, my app could become unresponsive and crash.

Forking another thread has solved the performance problem in the typical case, but I have increased systematic risk if AWS SES goes down. A down email service can now cause my whole app to crash. Before, only the account creation requests would fail.

Solution 1: Add a Thread with a Timeout

To limit the amount of threads that can build up, we can add a timeout:

liftIO $ forkIO $ handle logExcept $

logTimeout <=< timeout (60 * 1000000) $ do

resp <- liftIO

$ runResourceT

$ runAWS env

$ AWS.send

$ makeEmail email

logFailedRequest resp

As long as the rate of signups is below our maximum number of concurrent requests, a problematic email service will not take down our site.

The downside is that it is a little unclear if we have prevented catastrophic failure. For one, we need to estimate our maximum number of concurrent signups. If our rate of signups was 10,000 a minute just as the email service went down, we could be in trouble … but we would probably be in trouble even if the email service was up. That’s a lot of signups. Also, we picked an arbitrary time of one minute for the timeout. It is possible this is too small of a value and we are timing out potentially successful email requests.

We also don’t have any way to limit the concurrency of simultaneous requests, or to ensure that all of the threads have finished before we shutdown. This change is could be a solution, but it leaves room for improvement.

Solution 2: Bounded Queue

Instead of forking a thread for every request, we need a way to quickly queue notification requests. The queue should be bounded, have non-blocking writes (we will just log failures) and blocking reads, so TBMQueue will suffice.

First we create our queue and worker thread during server startup:

worker :: Env -> TBMQueue SendEmail -> IO ()

worker env queue = do

-- Make a loop enclosing the thread, env, and queue vars.

let go = do

-- Block waiting for a new email to send

mpayload <- liftIO $ atomically $ readTBMQueue queue

case mpayload of

-- Nothing means the queue is closed and empty.

-- Stop the loop ending the thread

Nothing -> return ()

Just payload -> do

resp <- AWS.send payload

logFailedRequest resp

-- Start the loop again

go

handle logExcept $ runResourceT $ runAWS env go

main = do

env <- newEnv Discover

queue <- newTBMQueueIO 100000

threadId <- forkIO $ worker env queue

scotty ...

We write a simple helper function for enqueueing:

enqueueEmail :: TBMQueue SendEmail -> Text -> IO ()

enqueueEmail queue email = do

msuccess <- atomically

$ tryWriteTBMQueue queue

$ makeEmail email

case msuccess of

Nothing -> putStrLn "Wat!! The email queue is closed?"

Just success -> unless success

$ putStrLn "Failed to enqueue email!"

We can then use queue in the handler:

post "/user" $ do

input <- Scotty.body

email <- maybe missingEmailError return

$ input ^? key "email" . _String

liftIO $ enqueueEmail queue email

json $ object ["id" .= email]

We’re done. No matter how slow the email service gets, our app will use, at most, the memory in our bounded queue (which is small) and only the resources needed for our worker thread. Our worst-case situation is that we will fail to send some emails, but our app will stay stable and the performance will be good.

Making It Real

Okay, so we’re not done. We have to handle gracefully draining the queue on shutdown. We also have to restart the thread after exceptions.

To help us with shutdowns and restarts, we’ll use a library called immortal . immortal provides threads which restart after exceptions and we can wait on their completion. It also uses proper exception masking hygiene to setup an exception handler on the newly spawned thread, something I have elided in the examples above (but is better in the example project and also doesn’t really matter for these examples).

Our new worker function will look like:

worker :: Thread -> Env -> TBMQueue SendEmail -> IO ()

worker thread env queue = do

-- Make a loop enclosing the thread, env, and queue vars.

let go :: AWS ()

go = do

-- Block waiting for a new email to send

mpayload <- liftIO $ atomically $ readTBMQueue queue

case mpayload of

-- Nothing means the queue is closed and empty.

-- Stop the loop and kill the thread.

Nothing -> liftIO $ stop thread

Just payload -> do

resp <- AWS.send payload

logFailedRequest resp

-- Start the loop again

go

handle logExcept $ runResourceT $ runAWS env go

The only thing that changed is that we now take in a Thread and stop the Thread when the queue is empty and closed with:

Nothing -> liftIO $ stop thread

To create the thread in our main function, we write:

thread <- create $ \thread -> worker thread env queue

and right before main finishes, we add:

atomically $ closeTBMQueue queue

wait thread

which will close the queue and prevent the program from exiting until the queue has been drained. I extended this to multiple workers in the example project.

Further Considerations

In our simple example, we are merely logging issues, but a real system might want to backfill the missing emails; having a method for storing which customers have been sent an email could be helpful.

Additionally, the loop should be extended to emit events useful for monitoring, such as sending the queue size to a metrics server. I’ll cover this in a future blog post, which as a bonus, will include a trick for testing imperative code.

Conclusion

The steps here were presented in increasing complexity and developer effort. It doesn’t take much to write the final version, but it’s fine to scale-up based on your needs and experience (I would skip attempt one…just add the timeout); just make sure you understand the tradeoffs. Ultimately having a durable persistent queue like Kafka is probably best, but it is always good to have options.

The “fix it later” portion will require polling the database for unsent emails. I haven’t it covered how to do that. By a using an in memory queue, you reduce the polling period you would need for prompt delivery of emails. Additionally, in other cases besides a signup email, for instance real time notifications that are time sensitive, you might not need to backfill missing notifications at all. amazonka is just really easy to use, so I chose it as an example.

If you want to try play with the examples above, take a look at this demo web server project which highlights them: https://github.com/jfischoff/asynchronous-email-example

If you enjoyed this story, we recommend reading our latest tech stories and trending tech stories. Until next time, don’t take the realities of the world for granted!

Tags