A Guide to Rate Limiting with Examples in JavaScript

Learn 2 rate limiting strategies you should avoid and 2 strategies you should be using and how to implement them in Node and JavaScript.

Rate limiting is an effective and relatively easy way to mitigate security risks. It will not be the only thing you do secure your applications, and it might not even be the most important thing you do to secure your applications, but it should ALWAYS be in your toolbox.

Let’s take a case where an attacker tries to guess a user’s password. If you set a limit on the number of times a password can be attempted per day, it will cripple the hacker’s attack and keep your users safe.

If you don’t rate limit, attackers can use your CPU and Memory to crack your users’ passwords!

If you accidentally allow users to read any arbitrary record from your database instead of only the records they should have access to, the problem will be much less severe if they can only read one un-authorised record per minute rather than extracting at 1000 records per minute.

Rate limiting makes the effects of being compromised less severe

We’ll consider two good types of rate limiting in this post, but first let's look at a few examples that are bad but commonly used.

Fixed Window Rate Limiting — Don’t Use This

Fixed window rate limiting is very simple. You say something like — each user can make 10 requests to my API per hour. The implementation is simple:

Keep a counter per user for the current hour Increment the counter each time the user makes a request Reject the request if the counter is over the threshold Reset all the counters at the start of each hour

The trouble is, this can be very frustrating and very unfair.

❌ Users who start making requests just before the hour ends get to make many more requests in the first few minutes than users who start making requests just as the hour has started.

❌ A user who uses more than you expected can be given a very harsh penalty. E.g. If I get to make 10 requests per hour, I can make 10 requests in under a second, but it takes over an hour to make 11 requests. That seems unfair.

Sliding Window Rate Limiting — Don’t Use This

Here, instead of saying “you can make 10 requests in each hour, aligned to the hours on the clock” we say “you can make 10 requests in any period of an hour”. This means that if you make 2 requests per minute for half an hour, you then have to wait half an hour before you can make any more requests, at which point you can make up to 2 requests each minute for the next half hour.

✅ This feels much less unfair. Regardless of when you start making requests, you get the same deal.

❌ Unfortunately, it’s still easy to get in a situation where you have to wait a full hour to make any more requests, which really sucks.

❌ To make this approach even worse, we now have to keep track of the timestamp of every request, in order to accurately account for how many requests the user is allowed to make.

Token Bucket Rate Limiting

Token bucket rate limiting is a fairly easy system to implement. All you need is a counter representing how many “tokens” a user has, and a timestamp at which that count was last increased.

A bucket of tokens with a gate that requires a token to walk through

The way this works is easiest to visualize as a bucket of tokens and a gate that requires a token to walk through. Each time we walk through the gate (make a request), we supply a token from the bucket.

Removing a token from the bucket to pass through the gate

If we keep walking through the gate (making requests), we’ll eventually run out of tokens.

Using the last token

Now when we try to make one more request, we won’t be able to, because we’ve run out of tokens.

There are no more tokens left

Fortunately, while we’re busy taking tokens to make the request, another background process is adding a token at a set interval. e.g. if we stick to our 10 requests per hour model, a token is added every 6 minutes

1 token every 6 minutes

This means that the most we have to wait between one request and the next is 6 minutes, which feels much fairer.

As long as we don’t make requests more than once every 6 minutes on average, we’ll never run out of tokens. If we make requests less frequently than once every 6 minutes, eventually the bucket will fill up with tokens. At that point, any new tokens will fall out of the bucket and be lost.

The bucket of tokens overflowing

The size of the bucket controls how “bursty” we’re allowed to be, and the frequency with which tokens are added controls the maximum rate of requests we’re allowed to sustain over the long term. E.g. if the bucket size is 10 and a new token is added once every 6 minutes, we can make 20 requests in the first hour (up to 10 in the first instant, then 1 every 6 minutes), but if we do that we’ll only be able to make 1 request every 6 minutes from then on. In the long run, this works out about the same as just allowing 10 requests per hour.

✅ This is fair regardless of when you start making requests — A new bucket is allocated on your first request

✅ This is never punitive — The most you’ll ever have to wait is the time between tokens being added

✅ This is efficient to implement — All you need to store is a timestamp and a token count for each user

The code for implementing rate limiting from scratch in JavaScript is only about 35 lines (with comments):

Implementation of Bucket Rate Limiting

Open On Code Sandbox

Note: This method is also sometimes called “Leaky Bucket” rate limiting, but that metaphor is much more confusing in my opinion.

Exponential Delay Rate Limiting

The other type of rate limiting you should be familiar with is “exponential delay”. This is only appropriate where:

You want to be punitive with what you perceive as abuse

You have a natural point to “reset” the rate limit

There’s pretty much just one use case for exponential delay, and that’s passwords.

The wrong way to handle passwords is to just lock people out after a certain number of attempts. This has two issues:

It’s easy to get permanently locked out of your account if you forget your password. Someone else can abuse this feature to permanently lock you out of your account, and they can do so very quickly.

Exponential rate limits don’t have either of these problems, at least not to the same extent.

With an exponential rate limit, the first few attempts are quick, and then subsequent attempts rapidly take longer and longer. An attacker trying to use brute force will end up needing an eternity to try sufficiently different passwords, and an attacker trying to deny legitimate users access will need to spend just as long of time if they want to lock the user out for extended periods.

The code for exponential delay is even simpler than for bucket rate limiting, and our takeToken function is exactly as before.

Implementation of Exponential Delay Rate Limiting

Open On Code Sandbox

Conclusion

Use bucket rate limiting to secure APIs because it’s fair and proportional for legitimate users while restricting abusive users.

Use exponential delay rate limiting for passwords and reset the delay after any successful login.

You should apply a bucket rate limit per IP address before applying the exponential rate limit by username to prevent someone having a go at every username with the same password.

If you’re using Node.js, and you don’t fancy maintaining your own rate limiting implementations (which you absolutely shouldn’t) you can use mine: https://www.atauthentication.com/docs/rate-limit