The fundamental part of understanding GCRA is knowing that you're calculating the times for the start and end of a window and that window is when requests are valid. This realization made understanding the rest of GCRA easier for me. Now let's walk though figuring out the windows that GCRA looks for.

Variables Used

There are some variables we should define before going further. Specifically, let's take a short step back to think about rate limiting.

A rate limit is usually defined as the number of operations you can perform within a period of time, e.g., 5,000 HTTP Requests per Hour. Some services allow for burst capacity for their operations, so they may advertise 20,000 operations per hour and give you an extra 10% if you need to burst past that so your true rate limit is 22,000 operations per hour. Let's now call the true limit LIMIT and the period of time it is allowed for the PERIOD , e.g., LIMIT = 22000 and PERIOD = 3600 ( = 60 min/hour * 60 sec/min ). Next let's define the KEY as the name of the "bucket". Finally, we'll define QUANTITY as the number of operations being requested and the ARRIVED_AT time for when the operation request was received. This might look like:

LIMIT = 22000 PERIOD = 3600 KEY = "operationA/user@example.com" QUANTITY = 1 ARRIVED_AT = NOW()

From these variables we can derive more information that we need for GCRA. One thing we need to know is the periods we "emit" used capacity from which is also often called the "emission interval", which we'll call EMISSION_INTERVAL . This can be derived like so:

EMISSION_INTERVAL = PERIOD / LIMIT

We also need to know what the capacity of the bucket is for the window bounds (and remember we're windowing based on time). In this case, it will be our capacity multiplied by our emission interval, which means

DELAY_VARIATION_TOLERANCE = LIMIT * EMISSION_INTERVAL # or DELAY_VARIATION_TOLERANCE = PERIOD

Next, we need to move our window forward each time we handle a request to perform an operation. We move the window based on how costly the operation is which can be determined (loosely) by the QUANTITY of operations being requested and the EMISSION_INTERVAL . This window movement is crucial to our sliding window technique that is being used to leak from our bucket. We can define the INCREMENT as

INCREMENT = EMISSION_INTERVAL * QUANTITY

If we did have actual costs associated with operations, we could have a modifier that factors in here as well, e.g., EMISION_INTERVAL * QUANTITY * COST_MODIFIER , but for most web applications today there is not much of a difference in costs.

Now, we can focus on the pieces that tend to be where everyone else starts. The first important variable is the "theoretical arrival time", which is commonly referred to as TAT . Based on our TAT and our INCREMENT we determine the the new theoretical arrival time, which we'll call NEW_TAT :

NEW_TAT = TAT + INCREMENT

This NEW_TAT defines the end of our new window. To find the start of our window, we need to find the time our next request will be allowed. We calculate this from our DELAY_VARIATION_TOLERANCE because that is the capacity of our bucket and helps us find the starting bound of our window.

ALLOW_AT = NEW_TAT - DELAY_VARIATION_TOLERANCE

Okay, so now we have the five pieces of information we need to find the remaining amount of our limit:

ARRIVED_AT

ALLOW_AT

NEW_TAT

TAT

EMISSION_INTERVAL

From here we can calculate the remaining limit as the floor (largest integer that is less than or equal to the value) of:

((ARRIVED_AT - ALLOW_AT) / EMISSION_INTERVAL) + 0.5

In Python this would look like:

import math remaining = math . floor ( (( arrived_at - allow_at ) / emission_interval ) + 0.5 )

The implication of this calculation is that the closer the arrival time of the requests are to ALLOW_AT , the less capacity there is remaining. The further we are from ALLOW_AT (in the future), the more capacity we have remaining.

So now that we know how to calculate all of this, let's get into some of the nuanced pieces of our algorithm.