This three part project aims to model League of Legends matches into Markov Decision Processes then apply Reinforcement Learning to find the optimal decision that also account for a player’s preference and go beyond simple ‘score board’ statistics.

I have made each part available on Kaggle to give a better understanding of how the data is processed and the model is coded. I have included the first two parts to give some clarity in the reasoning behind my final decisions on how the environment is modelled.

Part 1: https://www.kaggle.com/osbornep/lol-ai-model-part-1-initial-eda-and-first-mdp

Part 2: https://www.kaggle.com/osbornep/lol-ai-model-part-2-redesign-mdp-with-gold-diff

Part 3: https://www.kaggle.com/osbornep/lol-ai-model-part-3-final-output

This is very much a work in progress and is only designed to introduce the idea of what can be achieved if more complex machine learning methods are introduced into the game that go beyond simple summary statistics as shown in the image below.

Motivations and Objectives

League of Legends is a team oriented video game where on two team teams (with 5 players in each) compete for objectives and kills. Gaining an advantage enables the players to become stronger (obtain better items and level up faster) than their opponents and, as their advantage increases, the likelihood of winning the game also increases. We therefore have a sequence of events dependent on previous events that lead to one team destroying the other’s base and winning the game.

Sequences like this being modelled statistically is nothing new; for years now researchers have considered how this is applied in sports, such as basketball (https://arxiv.org/pdf/1507.01816.pdf), where a sequence of passing, dribbling and foul plays lead to a team obtaining or losing points. The aim of research such as this one mentioned is to provide more detailed insight beyond a simple box score (number of points or kill gained by player in basketball or video games respectively) and consider how teams perform when modelled as a sequence of events connected in time.

Modelling the events in this way is even more important in games such as League of Legends as taking objectives and kills lead towards both an item and level advantage. For example, a player obtaining the first kill of the game nets them gold that can be used to purchase more powerful items. With this item they are then strong enough to obtain more kills and so on until they can lead their team to a win. Facilitating a lead like this is often referred to as ‘snowballing’ as the players cumulatively gain advantages but often games are not this one sided and objects and team plays are more important.

The aim of this is project is simple; can we calculate the next best event given what has occurred previously in the game so that the likelihood of eventually leading to a win increases based on real match statistics?

However, there are many factors that lead to a player’s decision making in a game that cannot be easily measured. No how matter how much data collected, the amount of information a player can capture is beyond any that a computer can detect (at least for now!). For example, players may be over or under performing in this game or may simply have a preference for the way they play (often defined by the types of characters they play). Some players will naturally be more aggressive and look for kills while others will play passively and push for objectives instead. Therefore, we further develop our model to allow the player to adjust the recommended play on their preferences.

What makes our model ‘Artificial Intelligence’?

In out first part, we performed some introductory statistical analysis. For example, we were able to calculate the probability of winning given the team obtains the first and second objective in a match as shown in the image below.

There are two components that make takes our project beyond simple statistics into AI:

First, the model learns which actions are best with no pre-conceived notion of the game and

Secondly, it attempts to learn a player’s preference for decisions that will influence the model’s output.

How we define our Markov Decision Process and collect a player’s preference will define what our model learns and therefore outputs.

Pre-Processing and Creating Markov Decision Process from Match Statistics

AI Model II: Introducing Gold Difference

I then realised from the results of our first model attempts that we have nothing to take into account the cumulative impact negative and positive events have on the likelihood in later states. In other words, the current MDP probabilities are just as likely to happen whether you are ahead or behind at that point in time. In the game this simply isn’t true; if you are behind then kills, structures and other objectives are much harder to obtain and we need to account for this. Therefore, we introduce gold difference between the teams as a way to redefine our states. We now aim to have a MDP defining the states to be both the order events occurred but also whether the team is behind, even or ahead in gold. We have categorised the gold difference to the following:

Even: 0–999 gold difference (0–200 per player avg.)

Slightly Behind/Ahead: 1,000–2,499 gold difference (200–500 per player avg.)

Behind/Ahead: 2,500–4,999 gold difference (500–1,000 per player avg.)

Very Behind/Ahead: 5,000 gold difference (1,000+ per player avg.)

We also now consider no events to be of interest and include this as ‘NONE’ event so that each minute has at least one event. This ‘NONE’ event represents if a team decided to try stalling game and helps differentiate teams that are better at obtaining a gold lead in the early game without kills or objectives (through minion kills). However, doing this also massively stretches our data thin as we have now added 7 categories to fit the available matches into but if we had access to more normal matches the amount of data would be sufficient. As before, we can outline each step by the following:

Pre-processing

Import data for kills, structures, monsters and gold difference. Convert ‘Address’ into an id feature. Remove all games with old dragon. Start with gold difference data and aggregate this by minute of event, match id and team that made event as before Append (stack) the kills, monsters and structures data onto the end of this creating a row for each event and sort by time event occurred (avg. for kills). Add and ‘Event Number’ feature that shows the order of events in each of the matches. Create a consolidated ‘Event’ feature with either kills, structures, monsters or ‘NONE’ for each event on the row. Transform this into one row per match with columns now denoting each event. Only consider red team’s perspective so merge columns and where blue gains become negative red gains. Also add on game length and outcome for red team. Replace all blank values (i.e. game ended in earlier step) with the game outcome for the match so that the last event in all rows is the match outcome. Transform into MDP where we have P( X_t | X_t-1 ) for all event types in between each event number and state defined by gold difference.

Markov Decision Process Output

Model v6 Pseudocode in Plain English

Our very final version of the model can be simply summarised by the following:

Introduce parameters Initialise start state, start event and start action Select actions based on either first provided or randomly over their likelihood of occurring as defined in MDP When action reaches win/loss, end episode Track the actions taken in the episode and final outcome (win/loss) Update value of actions taken based on final outcome using update rule Repeat for x number of episodes

Introducing Preferences with Rewards

First, we adjust our model code to include the reward in our Return calculation. Then, when we run the model, instead of simply having a reward equal to zero, we now introduce a bias towards some actions.

In our first example, we show what happens if we heavily weight an action positively and then, in our second, if we weight an action negatively.

Output if we provide a strong POSITIVE reward for action: ‘+KILLS’

Output if we provide a strong NEGATIVE reward for action: ‘+KILLS’

More realistic player preferences

So let us attempt to approximately simulate a player’s actual preferences. In this case, I have randomised some of the rewards to follow the two rules:

The player doesn’t want to give up any objectives The player prioritises gaining objectives over kills

Therefore, our rewards for kills and losing objects are all the minimum of -0.05 whereas the other actions are randomised between -0.05 and 0.05.

Output with randomised player rewards

Output with randomised player rewards for all actions

Final output showing the value of each action given our current gold difference state and minute

Conclusion and Collecting Feedback from Players for Rewards

I have vastly oversimplified some of the features (such as ‘kills’ not representing the actual amount of kills) and the data is likely not representative of normal matches. However, I hope that this demonstrates an interesting concept clearly and encourages discussion to start about how this could be developed further.

First, I will list the main improvements that need to be made before this could be viable for implementation:

Calculate the MDP using more data that represents the whole player population, not just competitive matches. Improve the efficiency of the model so that it can calculate in a more resonable time. Monte Carlo is known for being time consuming so would explore more time efficient algorithms. Apply more advanced parameter optimisation to further improve the results. Prototype player feedback capture and mapping for a more realistic reward signal.

We have introduced rewards for influencing the model output but how is this obtained? Well there are a few ways we could consider but, based on my previous research, I think the best way is to consider a reward that considers both the individual quality of the action AND the quality of transitioning.

This becomes more and more complex and not something I will cover here but, in short, we would like to match a player’s decision making in which the optimal next decision is dependent on what just occured. For example, if the team kills all players on the enemy team, then they may push to obtain Baron. Our model already takes in to account the probability of events occuring in a sequence so we should also consider a player’s decision making in the same way. This idea is drawn from the following research which explains how the feedback can be mapped in more detail (https://www.researchgate.net/publication/259624959_DJ-MC_A_Reinforcement-Learning_Agent_for_Music_Playlist_Recommendation).

How we collect this feedback defines how successful our model will be. In my eyes, the end goal of this would be to have real time recommendations for players on the next best decision to make. The player would then be able to select from the top few decisions (ranked in order of success) given the match statistics. This player’s choice can be tracked over multiple games to further learn and understand that player’s preferences. This would also mean that not only could we track the outcome of decisions but would also know what that player attempted to achieve (e.g. tried to take tower but was killed instead) and would open up information for even more advanced analytics.

Of course, an idea like this may cause complications with team mates disagreeing and perhaps take an element out of the game that makes it exciting. But I think something like this could greatly benefit players at a lower or normal skill level where decision making between the players are difficult to communicate clearly. It could also help identify players that are being ‘toxic’ by their actions as teams would look to agree the play via a vote system and it can then be seen whether the toxic player consistently ignores their team mates by their movements instead of following the agreed plans.