My name is Jonathan McCaffrey and I work on the infrastructure team here at Riot. This is the first post in a series where we’ll go deep on how we deploy and operate backend features around the globe. Before we dive into the technical details, it’s important to understand how Rioters think about feature development. Player value is paramount at Riot, and development teams often work directly with the player community to inform features and improvements. In order to provide the best possible player experience, we need to move fast and maintain the ability to rapidly change plans based upon feedback. The infrastructure team’s mission is to pave the way for our developers to do just that - the more we empower Riot teams, the faster features can be shipped to players to enjoy.

Of course, that’s easier said than done! A host of challenges arise given the diverse nature of our deployments - we have servers in public clouds, private data-centers, and partner environments like Tencent and Garena, all of which are geographically and technologically diverse. This complexity places a huge burden on feature teams when they are ready to ship. That’s where the infrastructure team comes in - we’ve recently made progress in removing some of these deployment hurdles with a container-based internal cloud environment that we call ‘rCluster.’ In this article I’ll discuss Riot’s journey from manual deploys to the current world of launching features with rCluster. As an illustration of rCluster’s offerings and technology, I’ll walk through the launch of the Hextech Crafting system.

A bit of history

When I started at Riot 7 years ago, we didn't have much of a deployment or server management process; we were a startup with big ideas, a small budget, and a need to move fast. As we built the production infrastructure for League of Legends, we scrambled to keep up with the demand for the game, demand to support more features from our developers, and demand from our regional teams to launch in new territories around the world. We stood up servers and applications manually, with little regard to guidelines or strategic planning.

Along the way, we moved towards leveraging Chef for many common deployment and infrastructure tasks. We also started using more and more public cloud for our big data and web efforts. These evolutions triggered changes in our network design, vendor choices, and team structure multiple times.

Our datacenters housed thousands of servers, with new ones installed for almost every new application. New servers would exist on their own manually created VLAN with handcrafted routing and firewall rules to enable secure access between networks. While this process helped us with security and clearly defined fault domains, it was arduous and time-consuming. To compound the pain of this design, most of the newer features at the time were designed as small web services, so the number of unique applications in our production LoL ecosystem skyrocketed.

On top of this, our development teams lacked confidence in their ability to test their applications, particularly when it came to deploy-time issues like configuration and network connectivity. Having the apps tied so closely to the physical infrastructure meant that differences between production datacenter environments were not replicated in QA, Staging, and PBE . Each environment was handcrafted, unique, and in the end, consistently inconsistent.

While we grappled with these challenges of manual server and network provisioning in an ecosystem with an ever-increasing number of applications, Docker started to gain popularity amongst our development teams as a means to solve problems around configuration consistency and development environment woes. Once we started working with it, it was clear that we could do more with Docker, and it could play a critical role in how we approached infrastructure.

Season 2016 and beyond

The infrastructure team set a goal to solve these problems for players, developers, and Riot for Season 2016. By late 2015, we went from deploying features manually to deploying features like Hextech Crafting in Riot regions in an automated and consistent fashion. Our solution was rCluster - a brand new system that leveraged Docker and Software Defined Networking in a micro-service architecture. Switching to rCluster would pave over the inconsistencies in our environments and deployment processes and allow product teams to focus squarely on their products.

Let’s dive into the tech a bit to examine how rCluster supports a feature like Hextech Crafting behind the scenes. For context, Hextech Crafting is a feature within League of Legends that provides players a new method of unlocking in-game content .

The feature is known internally as “Loot,” and is comprised of 3 core components:

Loot Service - A Java application serving Loot requests over an HTTP/JSON ReST API.

Loot Cache - A caching cluster using Memcached and a small golang sidecar for monitoring, configuration, and start/stop operations.

Loot DB - A MySQL DB cluster with a master and multiple slaves.

When you open the crafting screen, here is what happens:

A player opens the crafting screen in the Client. The Client makes an RPC call to the frontend application, aka “feapp” which proxies calls between players and internal backend services. The feapp calls to the Loot Server The feapp looks up the Loot Service in “Service Discovery” to find its IP and port information. The feapp makes an HTTP GET call to the Loot Service. The Loot Service checks the Loot Cache to see if the player’s inventory is present. The inventory isn’t in the cache, so the Loot Service calls Loot DB to see what the player currently owns and populates the cache with the result. Loot Service replies to the GET call. The feapp sends the RPC response back to the Client.

Working with the Loot team, we were able to get the Server and Cache layers built into Docker containers, and their deployment configuration defined in JSON files that looked like this:

Loot Server JSON Example:

{ "name": "euw1.loot.lootserver", "service": { "appname": "loot.lootserver", "location": "lolriot.ams1.euw1_loot" }, "containers": [ { "image": "compet/lootserver", "version": "0.1.10-20160511-1746", "ports": [] } ], "env": [ "LOOT_SERVER_OPTIONS=-Dloot.regions=EUW1", "LOG_FORWARDING=true" ], "count": 12, "cpu": 4, "memory": 6144 }

Loot Cache JSON Example: