Encouragingly, traffic to the carwow applications has grown significantly over the last couple of years, and of course we hope this trend continues. Our Software Engineering Tools & Infrastructure (SETI 👽) Team has the task of ensuring that our applications, infrastructure, and tooling continue evolving to support the business as it grows.

Recently, SETI has been investigating how our applications perform under significant increases in traffic. One of the methods we’ve used is commonly known shadow requesting, and it’s been useful enough to us that I’d like to share it here.

So what is shadow requesting?

At its core, shadow requesting means replaying live requests shortly after they have completed. Either to the same host (in order to simulate additional load), or to a different host (to verify parity of responses).

Why shadow requests?

One of the benefits of load testing via shadow requests is that we are simulating real traffic patterns without any configuration overhead; by replaying the requests almost in real-time we create a lifelike scenario of additional users on our systems without needing to maintain a repository of user flows and their associated HTTP requests.

How are we doing this?

We developed a tool, umbra, to facilitate shadow-requesting. It works as a Rack middleware, which sends incoming requests, via a Thread Queue, to a Redis pub/sub channel. In a separate process (or processes), a subscriber listens to that same channel and replays the request.

I’ve added the current umbra configuration for one of our applications in this gist.

What do we need to watch out for?

We currently configure umbra only to shadow idempotent requests. As a general rule this means only GET requests, however, we also use an allow/deny list of endpoints in order to ensure we are not inadvertently manipulating production data.

We also made sure that umbra guarantees not to replicate its own requests, preventing an exponential increase in requests being made and an eventual self Denial of Service.

We have rich monitoring and alerts, powered by Honeycomb. This enables to monitor our applications and the point at which they start to ‘sweat’. Without good monitoring, we would not be able to gain any benefits or insights about how our applications behave under the artificially increased load we add via shadow requesting.

What are some of the outcomes?

We recently tested a new middleware using umbra . One of our applications receives around 50% of its traffic from internal clients. We already set client-side timeouts on internal requests in order to mitigate against cascading performance degradation.

When under significant load it is possible that our servers queue requests for longer than the client's timeout period. When this occurs, those requests would usually still be processed by the server, even though the client has already ‘given up’.

We added a middleware that would drop any requests that took longer to reach our application — from initial ingestion at the routing layer to reaching this middleware — than the timeout set by the client (via a X-Client-Timeout header set by all our internal HTTP clients). Using umbra we were able to test this middleware under load safely, and the results were positive!

We enabled the middleware only for web processes/dynos that were evenly numbered, and only for shadow requests, allowing us to compare the performance of web requests being processed with and without the new load-shedding middleware enabled.