Weds. Jun 21st:

Intro:

This essay will defend a vastly simpler implementation of Skill Rating adjustment than currently exists in Overwatch’s Ranked Matchmaking. I will suggest that removing all influencers of Skill Rating besides winning & losing (adjusted to game difficulty) will result in a number of improvements to the Ranked Matchmaking experience, especially with an eye towards the OWL and the eSports possibilities for Overwatch in general.

Incentives & Behavior:

Most game theoretic models begin with a simple assumption termed ‘rational self interest’, or the idea that individuals will take the course of action which most benefits themselves. This assumption is imperfect, as humans have been repeatedly shown to exhibit altruistic and pay-to-punish behavior patterns in empirical studies. However, broadly speaking, the notion that people will act in service of their own goals is a plausible one. It is especially so in an online context that lacks face-to-face empathic accountability.

Beginning from rational self interest, then, we can understand and predict the behavior patterns of players in Overwatch by examining the incentive structures that they face. Furthermore, alterations to these incentive structures have the power to dramatically change the decisions players make and even the mindset with which individuals approach the game.

The most clear and impactful incentive that Overwatch players (or at least those that choose to play Ranked Matchmaking) face is Skill Rating (hereinafter ‘SR’). Rising through the ranks feels satisfying and validating, placing in a top division can be a status symbol, and a high top-500 placement might even land you tryouts to play professionally. Naturally, then, many players are highly incentivized to seek to maximize their SR.

Skill Rating Maximization:

SR maximization will always be an incentivized behavior pattern. People want to be highly skilled, but more than that they want to appear to be highly skilled. This distinction seems small but is in fact very important. Crucially then, the key motivation for many (especially for the vast majority of players who will never compete in an eSports context) is to reach the highest SR that they can. This should be juxtaposed against the incentive to become the best player one can be: seeking to have the maximum impact upon a given team’s win probability (i.e. the eSports motivation).

Ideally then, the SR system should be set up such that ‘SR maximization behavior’ guides players to make the sort of decisions that positively impact the community and create the best gameplay environment possible. In my judgement, such an ideal system would align the SR maximization behavior with the eSports motivation, especially with an eye towards the Overwatch League. The current system fails to accomplish this alignment.

One Trick Players (OTPs):

While ‘one-tricking’ is not a behavior that I think should be actively discouraged or disallowed, I contend that it’s also a behavior that shouldn’t be specifically incentivized. In my view, the ideal system would be entirely equivocal towards OTPs.

Consider a hypothetical Mercy OTP (anecdotally the most commonly one-tricked hero, although I don’t have data that support this) who has reached a very high SR with essentially no other heroes played.

The current SR system rewards players who are playing at a high skill percentile compared to other players on that hero. This comparison is drawn not within one game instance, but rather across the entire dataset of all Ranked Matchmaking time played on that hero. What this means for our hypothetical Mercy OTP is that, so long as he/she plays better than other Mercy players, lost games will net a smaller SR drop and won games will net a larger SR gain. This impact is so significant that winning vs. losing is in fact a secondary concern to the ‘Mercy percentile’ our OTP is playing at.

We’ll get back to our hypothetical OTP in a moment, but now let’s take a step back to examine the bigger picture. The current SR system is crucially problematic for many reasons, but I’ll focus on two: (1) statistical judgements of skill are weak (for some heroes more than others) and (2) it leads different players to have different incentive structures.

(1) Statistical Judgements of Skill Are Weak:

The strength of this proposition is such that I’ll use the best counterexample as my own starting point: McCree. He is a hero with extremely low utility, extremely low survivability, and extremely high damage potential. A player with high accuracy, high damage per minute, and few deaths per minute is very likely to be a higher impact player than someone with weaker statistics. Such a player is minimizing McCree’s weaknesses (i.e. avoiding death) while playing to his strengths (high damage output). It is very likely that such a player is contributing more to an average game than a player with worse statistics. Even for McCree, though, these statistics are imperfect. Is a given player’s damage relevant? How often is he/she spamming enemy heroes without any plausible follow up (i.e. feeding ultimate charge to enemy supports)? A player who hits a few precise shots to pick a key player at a key moment (e.g. a support at the beginning of the fight or a DPS who is preparing to ult) is inarguably much more impactful to securing wins than one who merely sits in the back making poor focus decisions, yet the latter player would be statistically superior by the previously stated standards.

We can apply this same analysis to quite a few heroes, revealing that statistical judgements of skill become weaker and weaker as we move from the most mechanically demanding heroes in the roster to those with very little ‘traditional FPS skill’ requirements. Even a hero such as Roadhog demands a deeper statistical evaluation to really get at skill. One must weigh damage per minute and survivability against damage taken, as a great Roadhog knows how to minimize his exposure and with it the rate at which he feeds the enemy team ultimate. There is no magic formula to successfully achieve such a balancing act. How can one statistically capture the impact of a Whole Hog that prevents a Dragonblade and a Primal Rage from destroying one’s backline (while doing very little damage and earning no kills)? In a game as complex and decision-rich as Overwatch, I don’t see a way that these judgements can be made accurately and reliably by a predetermined formula.

The ultimate example of how useless statistical measurements of skill are–and how bad percentile-based SR adjustment can be–is of course my favorite foil Mercy. The impact of virtually every aspect of Mercy’s kit is poorly captured by statistical measurements. Hitting a 5 player Resurrection that is responded to by a 6 player Earth Shatter or Graviton Surge is in fact game losing. The statistics show a high ‘resurrected players per ultimate cast’ while the reality in game is that the enemy team just farmed MULTIPLE new ultimates. The entire HP pool of your composition just went into the enemy team’s ultimate bank TWO TIMES OVER. I can’t really overstate how bad it is to make a poor decision about using Resurrection. In these cases, not only would it have been better to save one’s own ultimate, but also it would have been better to disconnect from the server and let your team play 5v6 because at least then you would have had a chance to swing Ultimate tempo. Even if there is no immediate Ult-response to a big Resurrection, if your team fails to win the fight the situation is the same: massive Ultimate tempo swing to the opposing team. Very often, the most impactful Resurrections are instant casts to revive one key player that just died (because the opposing team has often expended cooldowns and cannot kill them again). Thus, playing to maximize the statistical measurements of Resurrection (i.e. waiting for a big Res) is in fact seriously detrimental to the success of the team.

Resurrection is furthermore a relatively weak support ultimate because it requires your teammates’ deaths instead of preventing them as all of the others do (once again Symmetra is not a support). Thus a very smart Mercy player actually chooses not to heal in many scenarios so that her support partner can get his/her ultimate faster. Heals per minute is therefore a fickle statistic whose maximization does not reliably communicate skillful or intelligent play.

Low deaths per minute and high damage boosted are the only statistical measurements of Mercy play that I see as actually meaningful, as these statistics communicate intelligent play and impact maximization. Solo kills with the pistol are also probably quite meaningful, but of course a Mercy player who seeks these out at poor times would be called a thrower. It’s not that Mercy is a ‘no-skill hero’, the key problem is that skillful Mercy play is almost never communicated by impressive stats. Even these statistics I mention as impactful fail to even come close to telling the whole story of player skill and game impact.

(2) Failure to Align Incentives:

Not only are OTPs highly incentivized to by the current SR system to continue one-tricking and to play for statistical maximization over wins and losses, these incentives are crucially opposed to the incentive structure that flexible players face. A flex player knows that he/she won’t be playing at the far right tail of his/her heroes’ skill distributions because his/her mastery of the game is spread across many heroes and many situations. The flex player seeks to achieve a high SR by playing the perfect hero imperfectly while the OTP seeks to achieve a high SR by playing the imperfect hero perfectly. While I don’t think that either of these strategies is deserving of punishment, I think that its important that the system not prioritize one over the other at any echelon of SR.

In the current system, the flexible player must maintain a higher win percentage (abstracting away from game difficulty) to reach the same SR as the OTP. This is deeply problematic in my eyes, as I see hero swapping as a fundamental part of the game. If an OTP doesn’t wish to engage with hero swapping as a part of gameplay, that’s fine, but their SR should reflect that choice. The same goes for players who don’t wish to engage with communication as a fundamental part of the game: you don’t have to talk, but if you lose games because of it then that is on you and ought to be reflected in your Skill Rating. A truly great player has the knowledge, intelligence, and decisiveness to pick the right hero for the right situation, filling in the gaps of his/her team composition while at the same time countering opposing composition decisions. Not every player has to aspire to be the greatest player of all time, but in my view the entire purpose of having a Skill Rating system to begin with is to measure and validate that very pursuit of greatness.

Suggestion:

Incentive alignment is a goal very worth of pursuing. When all players have the same goals, the potential for toxicity is greatly diminished (though certainly not eliminated). I personally find it quite frustrating to queue into Ranked Matchmaking with the goal of winning games, only to find other players do not share the same incentives. At the very top of the Skill Rating system, one should find other players that want to win games, not those that wish to engage in roleplay. This isn’t to say that OTPs can’t be good or impactful to winning games, my argument is rather that OTPs should be judged by their wins and losses rather than by the extent to which they engage in one-tricking. The current system punishes adaptation and experimentation vastly more than it needs to.

There is only one way to guarantee that every player has the same incentive: strip away all of the hidden formulas and percentile adjustments. Only when each player has only one incentive–to win–will incentive alignment truly come about. The only thing that should impact the SR consequences of a win or a loss is the relative skill of each team. Win a hard game and you should clearly be rewarded more than for winning an easy game, vice versa for losses.

The meaningfulness of Skill Rating is especially important as it is the only clearly available measurement of player skill outside of actual eSports experience. With the Overwatch League on the horizon, the time is now to restructure the system such that the very best rise to the top and have a fair shot at becoming professionals. Right now, the only way to scout talent is to do it on an individual, observational basis. Look at Dota 2, you will see fresh talent rising out of Ranked Matchmaking and being given a shot at a professional career simply for reaching the very top of the ladder. That’s because their MMR system answers exactly one question: ‘how good are you at winning difficult games?’

If I worked at Blizzard, I’d be demanding a HARD Skill Rating reset at the end of this season and an entirely purified win-loss SR adjustment regime going forward. If Blizzard really wants the best of the best to get their chance at fame and fortune in eSports, then there really is only one way.

Counterarguments:

The existence of percentile SR adjustment is primarily, in my understanding, to combat smurfing (or the purchasing of new accounts to play at a lower level than one’s true skill). Want to get serious about smurfing, Blizzard? IP & MAC check new accounts and tag them for evaluation while adding a report option for suspected smurfs to cross reference: if you can statistically target and punish throwers then there is no reason you can’t statistically target and adjust smurf accounts. It’s fine if statistical adjustments are used in exceptional and targeted cases, just get rid of them as the default for the entire player base.

“But I wanna one trick!” Go right ahead. No one can (or should) stop you. But if you lose games because of it, don’t expect special treatment. OTPs don’t deserve punishment, but they certainly don’t deserve specific rewards over players who choose to engage with hero-swapping as a fundamental and crucially necessary mechanic in Overwatch. This is especially the case as Blizzard is beginning to employ SR as a way to qualify for tournaments (see: OW Open) and they seem to be considering it as a potential scouting mechanic for new talent once the scene is more established.

To Blizzard: fix it now, or condemn the eSports potential of Overwatch in the long run.

EDIT: An earlier version of this article referenced Contenders as an example of a SR-gated tournament. This is inaccurate, as Contenders was never SR restricted. Rather it is the Overwatch Open that Blizzard is requiring a certain SR for.