To start getting into machine learning I started doing a small project — creating a script that would predict the brackets of The International. The International is the annual DOTA 2 hosted by Valve, and is the biggest tournament in terms of winnings in all of eSports. Due to crowdfunding, the current prize pool stands at $19,842,840, and is projected to reach more than $20,000,000 by the end of the tournament.

The International 2016 homepage

While doing research I came across Matt Harvey’s machine learning model for the 2016 NCAA March Madness Tournament. I decided to use his open-source script but apply it to DOTA 2 data.

Getting The Data

The first thing I needed to do was to get match data of the teams participating in the International. I reached out to Howard of yasp.co and he was able to direct me to the yasp database explorer, allowing me to get the match_id of all the pro DOTA 2 matches that they tracked.

Getting all match_ids in JSON format

Using these match_ids, I then used Valve’s own Steam Web API to retrieve data for each of the matches.

Example data from the Valve Steam Web API

Transforming The Data

I decided to just use Matt’s script and change it as little as possible, so I needed to transform the JSON data into the CSV format that his script uses. To do this I just used python’s ijson libraries to transform the data.

Changing The Parameters

Matt’s parameter labels were for basketball data, so I had to change them to the appropriate DOTA 2 parameters. From the data that I had I identified several variables that I thought would be a good criteria for the predictions:

Score | Kills | Deaths | Assists | Last Hits | Denies | Hero Damage | Tower Damage | Hero Healing | XP per minute | Gold per minute

I was unsure about the applicability of some of the stats or even if they were accurate (a lot of matches for example had scores of 0 for both teams), but I looked through the predictions and they *generally* felt right, with OG for example being predicted to win a lot of the match ups.

And Now… The Predictions!

This is how I’ve filled up my Compendium brackets based on the results: