In 1994 a married couple were invited to a meeting with the board of directors at the UK’s second largest Supermarket. Edwina Dunn and Clive Humby presented to Tesco, and following an awkward silence, Lord MacLaurin uttered data sciences’ version of ‘one small step’

“You know more about my customers after three months, than I know after 30 years”

Tesco launched Tesco Clubcard, the world’s first supermarket loyalty card, and unleashed the power of analysing shoppers’ baskets into the world.

Association rule mining is one algorithm, it analyses the probability that sets of items will appear together in a shoppers basket, when you buy eggs and bacon, you might by sausages. When you buy Nachos you might buy salsa. Hell, when you pick Shadow Demon you might pick Mirana.

Over the past few weeks i’ve been pulling data from Valve’s API and analysing it, using a couple of different methods and algorithms. In this first blog post I’m going to discuss this year’s international champions’ “unconventional & unpredictable” style of drafting.

The first section of this blog is a brief description of how my methodology of how I analysed the data, a few choices I made and an extremely simplified overview of the algorithm I used. Feel free to skip to the second part if you’re only interested in my findings. It is pretty heavy, so I really don’t blame you.

Methodology

In this project I used an algorithm in R [statistics package/programme, a bit like matlab] called apriori. You input a data set into apriori and it outputs patterns that it finds, as rules. It orders and evaluates these rules with three factors. Support, confidence and lift.

Rules – Given in the format {if this happens} => {this probably happens too} e.g. {“RTZ Invoker”} => {“Win”}

RHS/LHS – Right hand side/Left hand side. Referring to rules. The RHS refers to the, you guess it, right hand side of the rule.

Support – Support is basically says how often the LHS appears in the data. For example, if I were analysing secrets 14 games in the group stage, and Arteezy played invoker in 7 of them, any rule that has the LHS {“RTZ Invoker”} would have the support 7/14 = 0.5 (Assuming the data set had one line for each match)

Confidence – Confidence is the probability of a rule. If we use our running example, and say that Arteezy won 4 invoker games. Then the confidence would be 4/7 = ~0.57

Lift – This is the complicated one. Lift is the ratio of the items on the LHS co-occuring with the items on the RHS divided by the probability that the LHS & RHS co-occur if the two are independent (Which they are assumed to be, even if they’re not)

The real magic of lift is the result. Don’t worry about the maths of lift, all you really need to know is if the lift is greater than 1, it means the rule holds quite well (the presence of the items on the LHS make the probability of the items on the RHS more likely) and if it’s below 1, the opposite is true, if something is on the LHS it doesn’t really make a difference to the probability of the items occurring on the RHS

So, when we analyse data we’re after rules with acceptable supports & confidences (how many matches they appear in) and lift above 1 (the higher the better)

Extra information for people with computer science interests

Apriori is a relatively inefficient algorithm. But there are little tricks you can do to make it a little faster. The algorithm works by running through a data set, finding the most common single items, and forgetting about the less common ones and creating an item list. It then runs through the data again and tries to find pairs of the most common items, then triples etc, shortening the list every iteration for the most relevant items, decided by their support & confidence. These two values can be limited by the programmer to make sure the results don’t overload your RAM (or weekend)

The majority of the rules in the wings dataset had pretty good lift, due to me being fairly lazy efficient & choosing relatively high support & confidence values.

Beginning the Project

The best start to a data science problem is to ask the questions you want answered by the data. This in mind, these are were my objectives I was working towards;

Do wings have favourite heroes, or common/repeating hero pairs? What do teams ban against wings? Is there any correlation between certain picks & certain bans, for both the enemy team and wings? Is there a specific order to wings’ drafting style? E.g. Are supports picked before carries?

To solve these questions I created two little datasets to throw into the apriori algorithm. The first was to answer questions 1 & 2. I created a 25 lines data file, with each hero picked and banned. I gave heroes prefixes to differentiate them in the data. If it was a ban, I put a b in front of it, if it was an enemy move, an e, and if it was an enemy ban, it got the prefix ‘eb’.

ebKunkka,bElder Titan,ebMirana,bDrow Ranger,eIo,Batrider,Rubick,eLifestealer,bStorm Spirit,ebRazor,bEmber Spirit,ebWeaver,Slark,eNight Stalker,Magnus,eHuskar,bDazzle,ebAncient Apparition,eOgre Magi,Pudge,39,False

This basically turned every draft into a shopping basket of heroes.

The next data set was to answer question 3. Every pick that wings did in TI was a line of data. The hero pick, the player who played it, and the order it was picked. This allowed me to explore a little deeper into the drafting order. I’ll return to this point at the end of this post, as I think there’s a bit more I could do here.

Oracle,111114687,6,39,1

In both sets of data, I also included the ID of the enemy team, and whether or not wings won the match. I was expecting to find certain picks when wings faced certain teams. I remember 7ckingMad once wrote a blog something along the lines of “Playing your own draft vs playing against their draft” and I was interested to see if this could be explored in the data. Finally dota is, realistically, all about win or lose. If wings pick a hero 3 times, whether they win with it every time, or lose with it, is a completely different story. *Cough* pudge.

The findings – Wings’ Drafting

I’ll try not to talk too much about rules & boring stuff here, and keep it mainly focused on the trends I discovered. I’ve tried to make this understandable without reading the first part, it should be, but let’s face it – this is my first time, be gentle 😛

For those who skipped the preamble, here are the three questions I decided I wanted to explore through the dataset

Do wings have favourite heroes, or common/repeating hero pairs? What do teams ban against wings? Is there any correlation between certain picks & certain bans, for both the enemy team and wings? Is there a specific order to wings’ drafting style? E.g. Are supports picked before carries?

Arguably Wings’ best hero is Oracle. They picked it 8 times, and won every game. Interestingly it was most often picked when wings had second pick in the first phase (where you get double pick) 50% of the time. (Other picks are dotted around a bit)

In half of wings’ oracle games, their opponents banned both Batrider & Huskar. Huskar makes sense in the meta, but Batrider & Oracle is something I’d love to hear some theories behind this, as this honestly stumped me.

A pair wings do like is Sand King + Oracle. Sand King was never picked without Oracle, and was played four times.

Interestingly in Sand King’s four picks, he was played twice by iceice, as a support, and twice by Faith_Bian, as an offlaner. This intrigued me, so I went to dotabuff. In all of iceice’s games, Sand King was labeled as a roamer by dotabuff, one game innocences’ oracle was classed as a roamer, and the other he babysat a viper mid, against a Huskar/Io Suma1L/Zai. For Faith_Bian’s Sand King, he too was once classed as a roamer in one game, with iceice’s rubick roaming with him, and the other time oracle and sand king sat top to contest Aggressif’s carry void. It’s really interesting how two heroes picked together can be laned so differently.

(Match IDs here: 2569470828, 2546938575, 2551226728, 2551167223)

Last word about Oracle. He was banned 7 times against wings. They lost four of those games.

Another hero combination is Alchemist & Lifestealer. Both of these heroes were picked three times, and always together. A really interesting finding in this data was the way they drafted them. 2/3 games these heroes were picked, Lifestealer was picked in the second phase and alch in the last. And in the other game it’s reversed, Alchemist in the second phase, and Lifestealer last picked. Wings also only ran this combo when they had absolute last pick (second pick). Once again I don’t really understand this combination of heroes. Of course I’ve done the pub strat of Lifestealer aghs from alchemist + spirit breaker, but I’d love to get some insight why these heroes are picked together, and in such an interesting pattern too.

General stats dumps…

Wings picked Elder Titan 4 times, 3 of them were absolute first picks, and the other was a their first pick when they had second pick, and they were the only time iceice ever first picked himself a hero. They won every game with the world-smith. Elder Titan was banned 3 times against wings, they also won all those games anyway.

Drow ranger was banned 11 times against wings. It was untouched by either team only once in Wings’ entire run. In the Pudge/Techies games… Of course.

Every time the following heroes were picked, they were always played by the same players. Pudge, Invoker, Alchemist & Lifestealer (Picked 3 times), Drow Ranger, Elder Titan, Slardar & Mirana (4 times) and Oracle (8 times)

Half of the time Rubick was picked (3/6 games), he was picked in the double pick part of the second phase. Wings won every game with Rubick. Except one, where… you guessed it, they picked Pudge vs EG in the group stage.

Rubick was played by innocence in every game, except one. In the game that iceice was forced into rubick, innocence was playing Oracle again. So, in 15/25 (60%) games at TI, Innocence played two heroes.

Pick Order Mining

Alright, onto question 3. Are particular positions picked at certain times in the draft?

First off, I have an odd one. In 75% of games that wings played against EG, they picked Faith_Bian’s hero in the first pick of second phase (When they had first pick coming into that phase)

When Wings played against DC & Won, they picked Faith_Bian’s hero last pick 60% of the time.

In 50% of games that wings lost, they picked bLink’s hero last.

Apart from that, there really doesn’t seem to be much correlation in the order in which heroes are picked. Most of these rules seem quite random and dumb. To be honest, this is a of a product of association rules, a lot of rules may be statistically sound, but pretty stupid & useless. There are rules that observe players getting their picks in certain positions, but realistically they only equate to a couple of games within the data. I’ll talk more about order mining in my evaluation at the end of this post, where I propose a better, and simpler method of answering this question.

Pudge

Moving away from rule mining, Wings’ pudge is super interesting. They picked pudge in extremely important games. Their first game of TI, their first series of main event, and the first game in the grand final. There’s lots of speculation you can do around this pick. It’s almost as if they picked it to chill out before they got started. I honestly have no idea, but that’s the first question I’d ask if I ever got to interview iceice.

In this brief look at some of wings’ drafting data we’ve spotted quite a few trends. The obvious one is Oracle, they really like that hero, and they’re really good at it. We’ve discovered a few combinations Sand King + Oracle & Lifestealer + Alchemist. And, if you want to beat wings, just hope they pick pudge – there is actually a trend to Pudge games, they always banned Elder Titan. Sadly, it’s a useless trend.

Wings’ draft does have patterns, but honestly, we’ve barely scratched the surface here. There’s a lot more that can be done.

Further Work

Around the start of this year it was ‘revealed’ that Netflix has however many thousands of genres for their listings. (They can be found here: http://ogres-crypt.com/public/NetFlix-Streaming-Genres2.html) Back in January this gave me the idea to give Dota’s heroes these categories, this meant rather than just analysing hero names, I could analyse teams picking pushers, anti-pushers, carries, disablers, bkb piercing disablers, supports, position 4 supports that can also carry, magic damagers that scale into the late game… You can see my problem. Dota is incredibly diverse, and similar to Netflix’s library, lots of heroes would have lots of different categories, and a ‘miscategorisation’ (not a word apparently) would lead to incorrect statistics, and I don’t think I’m good enough at Dota to categorise every hero. Some games some heroes are picked for one of their attributes, sometimes for a few, but reducing a draft down into roles the heroes represent could extremely strengthen drafting understanding, especially when coupled with a rule mining algorithm, it blurs the lines between trade-offs (Riki/Bounty Hunter) and enhances the understanding of the strategy more than just the hero picks. Building a drafting AI that could replicate specific teams style of drafting is something I would be really interesting in pursuing, but time constraints will probably leave this as a dream.



Coming back to draft order, something I will attempt later is analysing the draft in stages. Simply taking the first picking stage of a draft could lead to more information that the watered down whole process, and later looking at the second and third phases conditionally dependant on the earlier phases would be interesting. Another way to tackle this problem is simply with averages. Just a simple bar chart with the pick positions, and how many times each players hero was picked in those positions. Here’s one for iceice and Shadow.

This is for when Wings had second pick. (18/25 of their games). The first bar is actually two picks merged into one (as they are picked at the same time), but there is a sure trend that shadow’s hero is picked second to last, maybe so any counter pick can be countered by the last pick of the game? Below is the same graph but with all 5 players, It’s not as neat, but it’s really interesting.

The most interesting part of this graph is easily Innocence. His hero was always picked within the first 3 picks. Interesting especially as he appeared to have the weakest hero pool within his team. bLink was usually either first, or last and faith_bian’s hero was only drafted once in the Second Phase first pick. Pretty cool graph.