Setting the Scene

The first symptom that came to our vigilant Network Operations Center's attention was that the number of League games starting went down drastically.

Not typical behavior

There are a lot of systems that go into a successful game creation, from matchmaking to load distribution to the game server itself, and it's not immediately clear from a symptom like this where the breakdown is. When we brought the experts in each of those systems into triage to make that determination, they all indicated that their service's health looked good, but very little traffic was coming in to them. A matchmaker with few people coming in to request a match doesn't have much to do.

So, all of that indicated that we're dealing with a systemic issue in getting player traffic into our backend. Metrics showed us some major problems in that neighborhood:

The horizontal red line all the way at the bottom is where sirens go off

What we're looking at here is the number of inbound connections to one of our generic container hosts. Those are highly capable single computers that run a bunch of smaller applications - in shop-talk, containers - that comprise the overall system that runs League. Two of those hosts are getting vastly more connections than is reasonable. To understand why, we need to talk about a particular kind of container, one performing an "edge" function.

