EDIT:

Thank you to all the Redditors who pointed out an oversight on my part. I calculated the win-rate matrix with the entirety of my data, not just the test data. I’ll be leaving the article as-is so people can see my mistakes, but I’ve amended it with the updated prediction accuracy rating. Enjoy!

Intro

How many times have you been watching a hyped rivalry match in the LEC and out of nowhere, during Pick-and-Ban of all times, the crowd starts going off as if a game-changing gank was coming and the other team had absolutely no vision on it? If you’re like me, it happens pretty much every time two powerhouse teams load into the server. Professional League of Legends players love nothing more than to troll their opponents with riot-inciting hovers, pick off-meta Champions that they’ve only practiced on their 95th alt in SoloQ, and of course, the historic pocket picks that come back after months, and sometimes years, of dormancy. Having only been in the scene since the start of 2018, I didn’t catch on to these occurrences until Spring 2019, a year after I started with Team Liquid’s League of Legends team. In fact, around that same time, I learned just how much the Pick/Ban phase can alter the course of a match, which leads me to our topic: Can you predict the outcome of a match solely based on data from the Pick/Ban phase?

Let’s Assess The Situation

Before we get neck-deep in data, let’s take a step back and look at the bigger picture. What are we doing here, and what information do we have readily available that we can use after Pick/Ban and before the kind voice of the announcer says “Welcome to Summoner’s Rift”? What we’re doing is determining the outcome of a game based only on factors from the pre-game. There are two possible outcomes of a game: winning or losing. From the sound of it, we can utilize one of Machine Learning’s most well-loved binary classifiers, the Logistic Regression. What information, or features, do we have access to when it comes to pre-game data?

Side selection

Champions & Positions

Matchups

So…not much, but that’s not to say we don’t have some meaningful information already at our fingertips! We’re going to have to be creative data engineers and to make the most of our less-than-ideal situation. When you’re a professional League of Legends analyst, you have an intuition about these data points, like which side is better to be on based on whether you want to counter-pick position x or y against your opponent, or how your Jungle-Mid synergy works in comparison to the opposing team’s. One thing you don’t have, at least not yet, is a built-in computer in your brain that can run through thousands of epochs of training data and give you more than those recency-based intuitions.

Arguably the largest factor in Pick/Ban, and the one I’ve decided to base this project on, is going to be Matchups, or how any given Champion plays into and against team compositions. With this information, according to the law of large numbers, we should get an accurate representation of a Champion’s general usability against an opposing team.

Quantification of a Matchup

How do you quantify a matchup? While there is a certain level of entropy that we could, and eventually should, account for, today we’re going to be focusing on one of the more obvious features: win-rates! For some time now I’ve been procrastinating and it’s time I finally create this data set. What I ended up forging was a matrix of champion matchup win-rates based on every professional game from 2018-19 across all major regions and a few minor regions. With this matrix, we can now start quantifying team compositions as they relate to the opposing team.