The rate limiting functionality offered by the Kubernetes-native Ambassador API gateway is fully customisable, allowing any service that implements a gRPC endpoint to decide whether a request should be limited or not. In this article, which builds on the previous part 2 and part 1, you will learn how to build and deploy a simple Java-based rate limiting service for Ambassador.

Getting Setup: The Docker Java Shop

In my previous tutorial “Deploying Java Apps with Kubernetes and the Ambassador API Gateway” I added the open source Ambassador API gateway to an existing series of Java (Spring Boot and Dropwizard) based services that were deployed into Kubernetes. If you haven’t seen with this, then I would definitely recommend going through this tutorial and the others in the series in order to familiarise yourself with the fundamentals. The rest of this article assumes you’re comfortable building Java-based microservices and deploying them to Kubernetes, and you also have all of the prerequisites installed (I’m using Docker for Mac Edge, with built-in Kubernetes support, but the principles should be similar if you are using minikube or a remote cluster).

Prerequisites

You will need to have these installed locally:

Docker for Desktop — I am using the edge community edition (18.04.0-ce), with in-built support for a local Kubernetes cluster — I have also increased the memory available to Docker to 8Gb, as the Java services can be a little memory-hungry at times :-)

Editor of choice, Atom or VS code, or IntelliJ for the Java code

You can grab the latest version of the “Docker Java Shop” source code here:

https://github.com/danielbryantuk/oreilly-docker-java-shopping

You can clone the repo via SSH like so:

The initial version of the service architecture and deployment looked as follows:

You can see from the diagram that the Docker Java Shopping application consists of primarily three simple services, and in the previous tutorial you added the Ambassador API Gateway as the “front door” of the system. It is worth noting that the Ambassador API Gateway will be running on port 80, the standard unauthenticated web port, and so you will need to make sure there is nothing else locally running on the same port.

Rate Limiting 101 with the Ambassador API Gateway

I have added a new folder “kubernetes-ambassador-ratelimit” to the repo that contains the Kubernetes config for this tutorial, and so go ahead and navigate to this directory via the command line. Listing that directory should show the following files:

(master *) oreilly-docker-java-shopping $ cd kubernetes-ambassador-ratelimit/

(master *) kubernetes-ambassador-ratelimit $ ll

total 48

0 drwxr-xr-x 8 danielbryant staff 256 23 Apr 09:27 .

0 drwxr-xr-x 19 danielbryant staff 608 23 Apr 09:27 ..

8 -rw-r — r — 1 danielbryant staff 2033 23 Apr 09:27 ambassador-no-rbac.yaml

8 -rw-r — r — 1 danielbryant staff 698 23 Apr 10:30 ambassador-rate-limiter.yaml

8 -rw-r — r — 1 danielbryant staff 476 23 Apr 10:30 ambassador-service.yaml

8 -rw-r — r — 1 danielbryant staff 711 23 Apr 09:27 productcatalogue-service.yaml

8 -rw-r — r — 1 danielbryant staff 659 23 Apr 10:02 shopfront-service.yaml

8 -rw-r — r — 1 danielbryant staff 678 23 Apr 09:27 stockmanager-service.yaml

You can apply these Kubernetes config files with the following command:

$ kubectl apply -f .

Doing so will deploy the following service architecture, with the primary difference from the previous architecture being the addition of the “ratelimiter” service. This service is written in Java, without a web/microservices framework, and it exposes a gRPC endpoint that can be used by Ambassador for rate limiting — this allows for a customisation and flexibility in regards for the rate limiting algorithm you can implement (for more details on the benefits of this, check out my earlier article!).

Exploring the Rate Limiter Kubernetes Service

The ratelimiter service is deployed into Kubernetes just like any other service, and could be horizontally scaled as appropriate. Here is the contents of ambassador-rate-limiter.yaml Kubernetes config file:

---

apiVersion: v1

kind: Service

metadata:

name: ratelimiter

annotations:

getambassador.io/config: |

---

apiVersion: ambassador/v0

kind: RateLimitService

name: ratelimiter_svc

service: "ratelimiter:50051"

labels:

app: ratelimiter

spec:

type: ClusterIP

selector:

app: ratelimiter

ports:

- protocol: TCP

port: 50051

name: http ---

apiVersion: v1

kind: ReplicationController

metadata:

name: ratelimiter

spec:

replicas: 1

template:

metadata:

labels:

app: ratelimiter

spec:

containers:

- name: ratelimiter

image: danielbryantuk/ratelimiter:0.3

ports:

- containerPort: 50051

You will explore the contents of the underlying “danielbryantuk/ratelimiter:0.3” Docker image later in the article, but for now all you need to know is that this service is running within the cluster, and exposes port 50051.

In the ambassador-service.yaml config file I have also updated the Ambassador Kubernetes annotations config to ensure that requests to the shopfront service are rate limited simply by the inclusion of the “rate_limits” property. I have also added some additional metadata “- descriptor: Example descriptor”, which I will explain in more detail in the next article. For now, I’ll say that this is a good way to pass additional metadata into the rate limiting service.

---

apiVersion: v1

kind: Service

metadata:

labels:

service: ambassador

name: ambassador

annotations:

getambassador.io/config: |

---

apiVersion: ambassador/v0

kind: Mapping

name: shopfront_stable

prefix: /shopfront/

service: shopfront:8010

rate_limits:

- descriptor: Example descriptor

You can check that the deployment has succeeded using kubectl:

(master *) kubernetes-ambassador-ratelimit $ kubectl get svc

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE

ambassador LoadBalancer 10.105.253.3 localhost 80:30051/TCP 1d

ambassador-admin NodePort 10.107.15.225 <none> 8877:30637/TCP 1d

kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 16d

productcatalogue ClusterIP 10.109.48.26 <none> 8020/TCP 1d

ratelimiter ClusterIP 10.97.122.140 <none> 50051/TCP 1d

shopfront ClusterIP 10.98.207.100 <none> 8010/TCP 1d

stockmanager ClusterIP 10.107.208.180 <none> 8030/TCP 1d

All six of our services look good to go (plus the Kubernetes service) — that’s three Java services, two Ambassador services, and the ratelimiter service.

You can test the deployment by making a curl to the shopfront endpoint, which (as shown above) should be running on the EXTERNAL-IP of localhost on port 80:

(master *) kubernetes-ambassador-ratelimit $ curl localhost/shopfront/

<!DOCTYPE html>

<head>

<meta charset="utf-8" /> http://www.w3.org/1999/xhtml "> ...

</div>

<!-- jQuery (necessary for Bootstrap's JavaScript plugins) -->

<script src="

<!-- Include all compiled plugins (below), or include individual files as needed -->

<script src="js/bootstrap.min.js"></script>

</body>

</html>(master *) kubernetes-ambassador-ratelimit $ https://ajax.googleapis.com/ajax/libs/jquery/1.12.4/jquery.min.js</a> ">

You will notice that the produces a lot of HTML, which is simply the frontpage of the Docker Java Shop, and can be more easily viewed within a browser pointed at http://localhost/shopfront/ However, for our rate limiting experiments it will be easier to use curl.

Testing the Rate Limiting

For this demonstration rate limiting service, I have decided to rate limit simply against the service itself i.e. when the rate limit service calculates whether or not to limit a request, the only metrics I will be considering is the number of requests made against a specific backend service within a time period. The rate limiting algorithm implemented within the code uses the token-bucket algorithm with a maximum bucket size of 20, and a refill rate of 10 tokens per second. Because the rate limiting is currently associated with any request, this means that you can make 10 requests against the API per second without any issues, and you can also burst above this temporarily because the bucket initially contains 20 tokens. However, as soon as the initial “burst” tokens have been used and you attempt to make more than 10 requests per second, then you will receive an HTTP 429 “Too Many Requests” status code. At this point the Ambassador API gateway is not forwarding the requests to the backend service.

Let’s see if you can simulate this by issuing lots of requests via curl. You’ll want to suppress the HTML payload being displayed ( — output /dev/null) and also the curl request ( — silent), but you still want to see the non-OK HTTP response status codes ( — show-error — fail). You can put all of these curl options together with a simple bash loop and date output (to show what time you are making requests) in order to create a very crude load generator (and get ready to CTRL-C in order to terminate the loop!):

$ while true; do curl --silent --output /dev/null --show-error --fail http://localhost/shopfront/ ; echo -e $(date);done

Tue 24 Apr 2018 14:16:31 BST

Tue 24 Apr 2018 14:16:31 BST

Tue 24 Apr 2018 14:16:31 BST

Tue 24 Apr 2018 14:16:31 BST (master *) kubernetes-ambassador-ratelimit $ while true; do curl --silent --output /dev/null --show-error --fail http://localhost/shopfront/ ; echo -e $(date);doneTue 24 Apr 2018 14:16:31 BSTTue 24 Apr 2018 14:16:31 BSTTue 24 Apr 2018 14:16:31 BSTTue 24 Apr 2018 14:16:31 BST ... Tue 24 Apr 2018 14:16:35 BST

curl: (22) The requested URL returned error: 429 Too Many Requests

Tue 24 Apr 2018 14:16:35 BST

curl: (22) The requested URL returned error: 429 Too Many Requests

Tue 24 Apr 2018 14:16:35 BST

Tue 24 Apr 2018 14:16:35 BST

curl: (22) The requested URL returned error: 429 Too Many Requests

Tue 24 Apr 2018 14:16:35 BST

curl: (22) The requested URL returned error: 429 Too Many Requests

Tue 24 Apr 2018 14:16:35 BST

^C

As you can see, the first several requests are served fine, as evident by the date the request was made being displayed alongside no errors, and quickly (at least on my Mac) the loop exceeds 10 requests per second, and I start receiving 429 HTTP response code errors.

As an aside, I would normally use the Apache Benchmarking “ab” load generating tool for this type of simple experiment, but I believe ab has an issue with calling localhost (or the Docker config was presenting some problems for me).

Examining the Rate Limiter Service

The code for the Ambassador Java rate limiting service can be found in the repo ambassador-java-rate-limiter on my GitHub account. In this repo you will find not only the code, but also the Dockerfile I have used to build the container image that I pushed to DockerHub. Using this Dockerfile as a template, you can make modifications to the code and then build and push your own image to DockerHub. You can then modify the ambassador-rate-limiter.yaml file in the main Docker Java Shopping repo to use your service for rate limiting.

Exploring the Java Code

If you now dive into the actual Java code, the main class of interest is RateLimiterServer, which implements the rate limiting gRPC interface defined by the Envoy proxy that is used within the Ambassador API — I’ve created a local copy of the ratelimit.proto interface that is used by the gRPC Java build tooling defined in the Maven pom.xml. There are three primary points of interest in the code: implementing the gRPC interface, running the gRPC server, and implementing the actual rate limiting code. Let’s now look at these in turn.

Implementing the Rate Limiting gRPC Interface

If you look into the inner class within RateLimitServer, named “RateLimiterImpl”, which extends RateLimitServiceGrpc.RateLimitServiceImplBase, you can see that I have overridden a method from this abstract class:

public void shouldRateLimit(Ratelimit.RateLimitRequest rateLimitRequest, StreamObserver<Ratelimit.RateLimitResponse> responseStreamObserver)

A lot of the naming conventions used here come from the Java gRPC libraries, and for more information you can consult the gRPC Java documentation. Having said this, you can clearly see the root of a lot of the names if you look into the ratelimit.proto file that defines the expected rate limiting interface by the Envoy proxy that is used behind the scenes of Ambassador. For example, you can see that the core service defined in this file is named RateLimitService (line 9), and there is a single RPC method defined within the service “rpc ShouldRateLimit (RateLimitRequest) returns (RateLimitResponse) {}” (line 11) which is implemented in Java through the method signature shown above for “shouldRateLimit”.

If you are interested, a lot of the Java gRPC code generation magic is conducted by the “protobuf-maven-plugin” (line 99 of the pom.xml).

Running the gRPC server

Once you have implemented the gRPC interface defined with ratelimit.proto, the next thing to do is to create a gRPC server that can listen and reply to requests made to it. If you look into the content of the RateLimitServer, you can follow the chain of processing from the main method. In a nutshell, the main method creates an instance of the RateLimitServer class, calls the start() method, and then calls the blockUntilShutdown() method. This starts an instance of the class, exposes the gRPC interface on the defined port, and listens for requests.

Implementing the Java Rate Limiting Code

The actual Java code responsible for the rate limiting process is contained within the shouldRateLimit() (line 75) method of the RateLimiterImpl inner class. Rather than implementing my own rate limiting algorithm, I’m using the popular bucket4j Java rate limiting library that is based on token-bucket algorithm. As I am limiting the requests made to each service, each bucket will be identified (or keyed) with the service name. Every request to each service will remove a token from the associated bucket. In this example I am not storing the buckets in an external database, and instead have opted to use an in-memory ConcurrentHashMap. If I was implementing this service for a production use case, then I would typically use an external persistence store to allow enable the use of horizontal scalability, probably something like Redis. For now you will have to bear in mind that if you horizontally scale the rate limit service without changing each services bucket limits, then you will be increasing the number of allowable (non-rate limited) requests directly in relation to the increased number of services.

An excerpt of the RateLimiterImpl code that creates the bucket4j bucket can be seen below:

private Bucket createNewBucket() {

long overdraft = 20;

Refill refill = Refill.smooth(10, Duration.ofSeconds(1));

Bandwidth limit = Bandwidth.classic(overdraft, refill);

return Bucket4j.builder().addLimit(limit).build();

}

The shouldRateLimit method code can be seen below, and this simply attempts to tryConsume(1) — try and consume one token from the bucket — before returning an appropriate response code.

@Override

public void shouldRateLimit(Ratelimit.RateLimitRequest rateLimitRequest, StreamObserver<Ratelimit.RateLimitResponse> responseStreamObserver) {

logDebug(rateLimitRequest);

String destServiceName = extractDestServiceNameFrom(rateLimitRequest);

Bucket bucket = getServiceBucketFor(destServiceName); public void shouldRateLimit(Ratelimit.RateLimitRequest rateLimitRequest, StreamObserver responseStreamObserver) {logDebug(rateLimitRequest);String destServiceName = extractDestServiceNameFrom(rateLimitRequest);Bucket bucket = getServiceBucketFor(destServiceName); Ratelimit.RateLimitResponse.Code code;

if (bucket.tryConsume(1)) {

code = Ratelimit.RateLimitResponse.Code.OK;

} else {

code = Ratelimit.RateLimitResponse.Code.OVER_LIMIT;

} Ratelimit.RateLimitResponse rateLimitResponse = generateRateLimitResponse(code);

responseStreamObserver.onNext(rateLimitResponse);

responseStreamObserver.onCompleted();

}

The code should be relatively self explanatory, and the primary responsibility of this method is to return either Ratelimit.RateLimitResponse.Code.OK, if no rate limiting is required on the current request; or Ratelimit.RateLimitResponse.Code.OVER_LIMIT if this request should be denied due to rate limiting. Depending on this response by this gRPC service, the Ambassador API gateway will either pass the request through to the backend service, or short-circuit this trip and simply return a 429 “Too Many Requests” HTTP status code without calling the backend service.

This simple example protects against one service becoming overwhelmed, but hopefully this also demonstrates the core rate limiting concepts, and could be relatively easily adapter to rate limit based on request metadata, such as user ID or something similar.

Until the Next Time…

This article has demonstrated how you can create a rate limiting service in Java, that can easily be integrated into the Ambassador gateway, and can also be fully customised with any rate limiting logic you require. In the next and final article of the series you will explore the Envoy rate limiting API in more depth, in order to learn more about designing a rate limiting service.

Please feel free to jump on the Ambassador Gitter if you have any questions, or send a tweet over to @danielbryantuk or @datawireio

Continue reading the other articles in this four part series:

Part 1: Rate Limiting: A Useful Tool with Distributed Systems

Part 2: Rate Limiting for API Gateways

Part 4: Designing a Rate Limiting Service for Ambassador