SW engineering, engineering management and the business of software

I’ve been spending lots of time thinking about and discussing APIs lately.

Eventually, the topic of rate limiting comes up, because an API is an open invitation for people to cause work to be done on your servers. Most of the time, people are polite about it, but a few curious people (and very rarely outright malicious people) will stress the limits of any exposed API.

At a high level, developer API keys and user authentication helps, but ultimately some form of rate limiting becomes necessary.

One scheme I thought up was likely born out of my youth going to local arcades. Back in the stone age, if you wanted to play video games, you had to pester your mother until she drove you to a dedicated place of business. At that point you continue to pester until a conversion of money to quarters or tokens takes place. You would then insert one or more of the tokens into a game of your choice for a few chances at dopamine release.

The rate-limiting model works like the arcades of antiquity with a very generous and patient mother. Each user has a set number of credits in a credit pool. Credits are deducted from the pool each time you hit an API endpoint. These credits have a regeneration rate (X/minute) and also a cap (MAX_CRED). Each endpoint would consume one or more credits.

The trick is that endpoints have a credit cost relative to the resources required. For example, GET methods to items that are easily cached would only cost a few credits. Expensive endpoints, such as multi-server queries with JOINS or POST methods that upload/create resources that take permanent storage would cost an order of magnitude more.

No Stalling

This system allows “spiky” user behavior without arbitrary stalling. Other rate-limiting systems use a time window of 15 to 60 minutes. If an end user exceeds their quota in the first minute of a 15 minute window, they have to wait 14 minutes before doing anything. To differentiate between expensive and cheap actions, each endpoint (or group of endpoints) would have to track its own time window.

With a credit based system, the window can be arbitrarily small. A regeneration rate of 6/minute corresponds to an effective time window of 10 seconds. If an end user blows through CRED_MAX, they aren’t stalled for very long before they can resume inexpensive actions.

KISS

In the simplest case, you are only tracking one pool per user, rather than having a time window for every endpoint. An API could could certainly have multiple pools, but it is not required.

Furthermore, the system doesn’t need to actually keep track of every user’s credit balance every minute, but rather just the user’s last known balance at a point in time.

A hypothetical example where credits regenerate at one/minute would be:

Time: Event: Cost: Pool Balance: 00:00 User A has CRED_MAX -- 100 00:10 User A POSTs a new image 20 80 00:10 User A POSTs a new image 20 60 00:10 User A POSTs a new image 20 40 00:20 User A GETs list of images 02 48

When the app reads the credit balance at 00:20, we have a record that states the balance was 40 at time 00:10. The hypothetical getCreditBalanceForUser() function does some math knowing that 10 minutes have passed (during which 10 credits have been regenerated) and returns the current pool balance of 50 which is enough to cover the cost of the GET. There is no need to iterate across all users and increment the credit value every minute.

This system adheres to one of the principles of scalable architectures:

Don’t incur resource costs for actions that aren’t taken.

In this case, no work is being done by the system from time 00:11 ~ 00:19, even though conceptually, credits are regenerating during that time.

In Practice

One of the advantages of this scheme is that the back-end storage of credits can live happily in Redis or some other memory based system without permanent storage. If the server is reset, then everyone just gets a free play; their credits can be temporarily reset to CRED_MAX. A memory based system is extremely unlikely to be a bottleneck for any given endpoint. Redis is doubly appropriate because of its nice increment and decrement operations and EXPIRE can be used to clear out users that reach the credit cap.

In practice, I’m looking at a very fine-grained production implementation, where CRED_MAX is in the thousands and regeneration rate is near one per second. Cheap, cacheable endpoints cost 5-10 credits and expensive ones are in the double or triple digits. Ultimately, you’ve succeeded if the vast majority of end-users never notice the rate-limiting system at all.

Now, if only I had a quarter for every time my articles caused a dopamine release.

Many thanks to @jedlau, @TimHaines, @jkubicek and @nolancaudill for reading drafts.