Strategies for mitigating application DDoS in microservice architectures have just been published in a blog by Netflix. It includes an overview of how to identify requests which trigger these attacks, how to test them with their open source Repulsive Grizzly and Cloudy Kraken frameworks, and finally some best practices for protecting a system from them.

Scott Behrens, ambassador of application security at Netflix, and Bryan Payne, product and application security lead at Netflix, first point out that microservice architectures are particularly susceptible to application DDoS attacks. This is because expensive API calls can produce multiple network hops around services, effectively causing the system to attack itself:

"A single request in a microservices architecture may generate tens of thousands of complex middle tier and backend service calls."

The first challenge imposed by these application DDoS attacks is identification. How can what looks like a legitimate API call from a user, be detected at the edge as something which will trigger heavy resource utilization internally?

One of the first strategies outlined is identifying how long API calls take. As opposed to looking at the front tier, which may give false positives, it’s more advantageous to monitor request times for back-end services. These requests can then be reverse-engineered in order to determine what sort of original API calls could have triggered it.

When the developer has found these API calls, it’s a process of looking at the request itself and finding out ways it can be made more expensive. The example given is a range parameter in a search request, which can be increased in order to produce a higher result set. Useful indicators of whether the correct request has been identified is through error indicators such as rate-limiting and exceptions, or simply increased latency.

Once these sorts of requests are identified, it's suggested to use Repulsive Grizzly, an application layer DDoS testing framework as a means to run them. It is not an identification tool, but provides a means of streamlining the testing process by triggering these requests against the system under test.

They also introduce Cloudy Kraken, an open source AWS testing tool which then helps orchestrate the test runs at a global scale. This is done by managing a fleet of scalable, cross-region AWS instances, each running the Repulsive Grizzly test suite. It also provides time-synchronization functionality, making sure that the tests are run in parallel.

Once requests have been identified and validated through testing, the following mitigation strategies are suggested:

Produce an architecture which minimizes dependencies between microservices. If a service fails, it should ideally fail in isolation, without breaking any others.

Understanding how services queue and service requests. For example, limit batch sizes or requested objects.

Providing a feedback loop from the backend services to the web application firewall. This provides it with extra information about the downstream resource utilization of an API call, which it would not have been determinable at the edge.

Monitor cache misses, as in too many could mean that the cache is not configured correctly.

Leverage various resilience patterns in clients, such a circuit breakers and timeouts.

The full blog is available to read online, with a case study and more in depth analysis of the subject.