Handling Rate-Limits with Scala Futures

Slow down while going fast to go faster.

Rate-limits are an unfortunate part of life when developing/integrating against external APIs. Rate-limits vary in their implementation, but usually boil down to some sort of pre-defined limit on the number (and possibly type) of requests that can be made within a specific time-window.

Dealing with this when writing concurrent code, with multiple points of integration, can be difficult. Your first instinct may be to introduce retries; this, however, does not ensure fairness and results in data-races. To properly deal with, and adhere to, rate limits, the solution must have the properties:

serialization of requests (to ensure fairness)

track quota usage

detect errors related to exceeding the quota

schedule retries, while maintaining serialization

support timeout (in case the queue grows large)

track queue length and estimated wait-times

This should all make sense except maybe for the first one. It is worth clarifying that serialization here only includes request execution order, not completion order. This means that parallel execution of requests is still on the table. Of course, there wouldn’t be much point to me using Future s in this post if that weren’t the case. 😉

Harnesses

It’s no fun to show a bunch of code if we can’t dump it into a REPL and play with it. The code in this post does not have any dependencies and I’ve put a link at the end of each section to a Gist that is the cumulative code at that point, which you can paste into your REPL (protip: type :paste into the REPL to dump large blocks of code).

Of course, this post only provides the wrapper for your own API calls, so to make sure we still have something to play with, here is a mock API call we can use to play with in the post.

We can copy and paste this into our Scala REPL and start having some fun!

Serializing Requests

Time to cereal’ize requests!

Uh…

Joking, anyways… In order to make our requests serial, we need some kind of FIFO data-structure that we can use to control our requests; so, a queue. However, we also need to limit the number of currently active requests within some time span (our basic rate-limiting support).

Let’s start with some basic accounting structures and a class to wrap this all up in:

Our ApiService class has everything we need to get started. The input to the class is a limit and a duration of time. This forms the basis of our rate-limiting behavior. Inside the class we have a few things to keep track of. First we have the requestQueue which, as it sounds, is a queue of pending requests. For now requests are represented as RequestQueueItem which will explain more later.

We also have the requestCount and windowStopTime fields that are used to ensure that for a given time-window, derived from the timeFrame , we do not exceed our limit.

The request is the star method in the API we’re writing today. It’s rather simple in that it takes a function that yields a Future . This provided function is the user-defined code to make a request to our rate-limited API. request then creates a new Promise object and passes it’s Future back to the client. Now our code can control the execution of the Future and only yield results back to the client when we want (as opposed to composing the Future directly and returning that).

Lastly, the call to tryRequest checks to see if a request can be made, and makes one. This is to ensure that when the queue has capacity, a request is fired as soon as it is added to the queue, rather than using some sort of scheduling code.

Let’s look at how hasCapacity works:

hasCapacity checks to see if we can make a request within our current time windows. However, if it notices we are outside of our time-window, then a new window is created and our request counter is reset.

The last bit of our initial code is the actual execution of the request via makeRequest :

This one is a little longer, but still pretty easy to break down. If we have a request, we increment the request count and then execute the user-supplied request function. If this results in a failure, we re-queue the request and attempt to make another. If we succeed, then we yield the result back to the user. Finally, we try to make another request. We do this outside of the future to allow for parallel execution and to ensure we use our allocated limits (e.g. high throughput + high latency).

Note that on the failure case we have to synchronize as we’re modifying the state of our queue from a different context within the Future ’s callback.

Gist of code so far.

Let’s paste this in a REPL and see what happens:

scala> val service = new ApiService[String](5, 10.seconds) scala> (1.to(10)).map({ _ => service.request(

() => MockAPI.simpleService

)}).toList res5: List[scala.concurrent.Future[String]] = List(

Future(Success(1, 2, 3, 4, 5, 6)),

Future(Success(1, 2, 3, 4, 5, 6)),

Future(Success(1, 2, 3, 4, 5, 6)),

Future(Success(1, 2, 3, 4, 5, 6)),

Future(Success(1, 2, 3, 4, 5, 6)),

Future(<not completed>),

Future(<not completed>),

Future(<not completed>),

Future(<not completed>),

Future(<not completed>)

)

Since our MockAPI returns instantaneously, we can see that this works as expected. Given 10 items to execute, we only get 5 back. The remaining 5 have to wait for the next execution window. However, what if we wait 10 seconds or so and re-check on their status.

scala> res5 res6: List[scala.concurrent.Future[String]] = List(

Future(Success(1, 2, 3, 4, 5, 6)),

Future(Success(1, 2, 3, 4, 5, 6)),

Future(Success(1, 2, 3, 4, 5, 6)),

Future(Success(1, 2, 3, 4, 5, 6)),

Future(Success(1, 2, 3, 4, 5, 6)),

Future(<not completed>),

Future(<not completed>),

Future(<not completed>),

Future(<not completed>),

Future(<not completed>)

)

Welp, that didn’t seem to work. The problem is that once capacity has been reached, there is no process to kick off the requests again once a new time-window opens up. That is, unless someone makes another request. This is non-ideal. Let’s see how we can fix this.

This is super cool. We’ve pulled some surprisingly handy utilities from the Java standard library that allow us to schedule tasks asynchronously. Now if when we run out of capacity within our current time-window, we can schedule a task to re-check at the beginning of the next window. And if, for some reason, our capacity changes, we can cancel the task. The one tricky bit to this code is the use of @volatile , which ensures that a consistent version of recheckFut is read (since we are updating it in multiple threads). While this does create a race-condition of sorts, the permutations are simple enough to understand and is preferable (IMO) to introducing an additional lock.

Now if we re-run our example earlier, and wait 10 seconds, we should see all of our tasks complete.

scala> (1.to(10)).map({ _ => service.request(

() => MockAPI.simpleService

)}).toList res5: List[scala.concurrent.Future[String]] = ...

// Wait 10 seconds and then look at res5 res5: List[scala.concurrent.Future[String]] = List(

Future(Success(1, 2, 3, 4, 5, 6)),

Future(Success(1, 2, 3, 4, 5, 6)),

Future(Success(1, 2, 3, 4, 5, 6)),

Future(Success(1, 2, 3, 4, 5, 6)),

Future(Success(1, 2, 3, 4, 5, 6)),

Future(Success(1, 2, 3, 4, 5, 6)),

Future(Success(1, 2, 3, 4, 5, 6)),

Future(Success(1, 2, 3, 4, 5, 6)),

Future(Success(1, 2, 3, 4, 5, 6)),

Future(Success(1, 2, 3, 4, 5, 6))

)

Things are looking much better now, but there is still a problem that is slightly less obvious. If the user-supplied functions fails (for any reason), then we assume it was a rate-limit related failure and requeue the request. The reality is that for any sort of API request over a network, rate-limiting is only one small class of possible errors we could encounter. We’ll explore this further in the next section.

Gist of code so far.

Detecting Quota Errors

Users need a way to define what is a rate-limit error and what isn’t. Moreover, the user should not be limited to a failed Future in order to detect a rate-limit exception. This can easily be represented by the type:

Either[Throwable, T] => Boolean

We can keep the same default behavior while extending it to be override-able by the user:

To save typing, LimitDetector represents our PartialFunction . request now takes a LimitDetector and places it in the queue item to be used later. If the user does not specify one, the default is used. The default is initially set by the class (to our original behavior) but can be set at the instance-level using withDefaultLimitDetection . Now to update the makeRequest function:

Aside from some general restructuring of the method to make it more readable (reduced nesting), the main change is that upon request completion ( result.onComplete ) we check both the success and failure cases to determine if a retry is needed. Since this error-handling is specific to rate-exceeded errors, we’ll consider the infinite retry loop that is being created intended behavior.

Gist of code so far.

Let’s try this new code using out default limit detection:

scala> val service = new ApiService[String](5, 10.seconds) scala> (1.to(10)).map({ _ => service.request(

| () => MockAPI.throwingRateLimitedService

| )()}).toList res3: List[scala.concurrent.Future[String]] = List(

Future(Success(1, 2, 3, 4, 5, 6)),

Future(<not completed>),

Future(Success(1, 2, 3, 4, 5, 6)),

Future(Success(1, 2, 3, 4, 5, 6)),

Future(<not completed>),

Future(<not completed>),

Future(<not completed>),

Future(<not completed>),

Future(<not completed>),

Future(<not completed>))

We can actually see that 2 of the 5 initial requests failed and were retried behind the scenes. If we were to look at the result of res3 a few seconds later, we might see something like:

scala> res3

res8: List[scala.concurrent.Future[String]] = List(

Future(Success(1, 2, 3, 4, 5, 6)),

Future(<not completed>),

Future(Success(1, 2, 3, 4, 5, 6)),

Future(Success(1, 2, 3, 4, 5, 6)),

Future(<not completed>),

Future(Success(1, 2, 3, 4, 5, 6)),

Future(<not completed>),

Future(<not completed>),

Future(<not completed>),

Future(<not completed>))

You can see that progress is made, but maybe not as much as you would think. This is because failed requests are re-queued at the head of the queue. If they continue to fail, then little progress will be made. Some out-of-order progress is made due to the async nature of requests being re-queued. Eventually all tasks should complete.

And again, we can ensure the same code works using our added limit-detection code.

(1.to(10)).map({ _ =>

service.request(() => MockAPI.nonThrowingRateLimitedService) {

case Left(_) => false

case Right(msg) => msg == "RATELIMIT_EXCEEDED"

}

}).toList

I’ll leave it to you to play with this in you REPL.

Supporting Timeouts

(Photo by Ken Lawrence on Unsplash)

Ratelimits are a necessary evil that, regardless of what we prefer, we must deal with. Sometimes this means that you may want to make more API calls than you have capacity for. While proper queuing and retries can help…

What! I’ve followed all this time for you to tell me that the fancy queue and retry mechanisms we built aren’t the answer?!? (╯°□°）╯︵ ┻━┻

Calm down, please. These things are important, but when the requests continue to come faster than your quota permits you may want to provide some sort of fail-fast mechanism so your application can gracefully fail.

The other option, of course, is to ignore the problem and let requests pile up until the queue grows so large you run out of memory and crash the box. This happening only after your response times, assuming the results of these API calls propagate to clients, shoot through the rough.

Geez, you don’t have to be so snarky. I’ll go with the first option.

Good choice! We can simply add another parameter to our request function to specify the maximum time we’d like to wait:

Note that we’ve added one new field to our RequestQueueItem object:

The request method now takes a maximum amount of time to wait (with the default being forever) and then calculates a deadline for us to start executing the request. So we’ve stated our intentions, but we still want some things to happen if our deadline is reached:

We do not want to waste resources executing a request past it’s deadline

We should complete the user’s Future as a timed-out, failed future

as a timed-out, failed future We should remove the the request from our queue to avoid objects piling up and eating up memory

To do this efficiently we can create a function that trims our queue and call it every N time-unit. The time-unit is configured by the client and works similar to our windowStopTime and timeFrame values.

With these new values, we can write a simple cleanup to execute every cleanupTimeFram :

cleanupRequests checks to see if it is time to clean the queue. If it is, it removes all stale requests from the queue, completes them with a TimeoutException , and resets the time for the next cleanup.

The last thing to take care of is to make sure we don’t perform work on requests that have exceeded their time limit, but managed to avoid the cleanup function before being popped off the queue:

The makeRequest function now quickly discards requests past the deadline until either a valid request is found or the queue is empty.

Conclusion

There we have it, a simple rate-limiting service that works around/with Scala Future s. The full, documented source can be found here:

https://gist.github.com/JohnMurray/34e3beb7f5eed4935f70dc45b0256067

Some final notes before you go: