How I collected the data

Blizzard gives everyone access to a limited view of their career profile at playoverwatch.com/en-us/{Platform name}/{Battletag}. So by getting a ‘random’ battletag and looking up their career profile, you can obtain a fairly unbiased sample of an Overwatch player. By doing this for every battletag for a username with 5 letters of the English alphabet or fewer, you can create a fairly large database of players which should be mostly unbiased towards any particular type of player. Doing this generated a list of 2270791 valid (at the time) battletags.

Personally, the largest bias I can think of that is introduced to this dataset is the number of users named after a hero in Overwatch with a name that is a part of the set of usernames, such as Mercy, Genji or Ana. These users may be inherently predisposed to having more playtime on these heroes, which would affect their representation in this dataset compared to other heroes with longer names, such as Widowmaker or Reinhardt. Regardless, I don’t think this impacts the unbiasedness of the dataset too much.

The data presented here was downloaded between 10AM CET on the 26th of April (1–2 days before the end of the 9th competitive season of Overwatch) and the 10th competitive season. playoverwatch.com also doesn’t always update instantly, sometimes taking a day or more to update your profile. This does mean some of the profiles have changed since then, and that some of the data is somewhat inaccurate. But I believe (and hope) that the difference isn’t too significant. This delay mostly affects profiles that are inactive for some time, so SR decay on a number of profiles may not be properly reflected in this dataset.

OWAPI provides a method for parsing an Overwatch players career profile and storing the data in a JSON format. However, OWAPI runs as a server which downloads the webpage from playoverwatch.com, performs some caching operations and parses it to return it the JSON file. This wasn’t very efficient as the download rate for the server wasn’t very high and the server would also crash when I sent requests at the rate I wanted. So I decided to write my own solution (I wouldn’t recommend this to anyone by the way, it’s a lot of work and a number of things on playoverwatch.com don’t really work properly). I used aria2 to download all the webpages (in segments, as passing it a list with 2 million names would cause it to consume far too much memory), then would check whether or not I was successfully able to download the webpage of a players career profile and then only keep the pages of players who had a skill rating, which indicated that they had completed their placement matches as I was only interested in data from competitive mode. This solution however, did not work too well and my scripts for filtering the webpages crashed a number of times. All of this resulted in the total number of webpages which could be downloaded and accounted for without issues being reduced to 1567222. Fortunately none of the files containing data from players who completed their placement matches had been deleted.

After this I had to extract the career profile data from all of these pages. OWAPI only works for getting data directly from webpages served by playoverwatch.com, and in my case the webpages containing career profiles were stored locally. So I decided to fork OWAPI and strip it down to a library which consists of only the parsing part of OWAPI. A number of issues arose here as well, which resulted in the number of career profiles which could be accounted for being reduced to 1329189. After getting rid of all the career profiles which hadn’t completed their placement matches, I was left with 426920 profiles.

There is however, one large issue with playoverwatch.com that I haven’t mentioned thus far. If a player finishes their placement matches, and proceeds to never touch the game again, whether that be quick play or competitive, the website will show their competitive stats for the last competitive season they played. So among all the profiles I collected, a significant number of them may be outdated and not representative of players who completed their placement matches in season 9. One way to make sure that a profile was showing competitive mode data from season 9 was by checking whether or not they had any playtime in quick play as Brigitte (she was not available in competitive mode until season 10). Brigitte was added to the live version of Overwatch on the 20th of March 2018, which means that everyone with any playtime on Brigitte registered on playoverwatch.com has had their profile updated since then, so their competitive stats are valid season 9 stats. However, this does mean that anyone who placed in season 9 but hadn’t played as Brigitte by the time I collected this data is out of the dataset.

After this I was left with 122414 profiles. While having playtime as Brigitte as a requirement for being included this dataset does alter it significantly, I don’t think it’ll introduce significant bias to some statistics such as the skill rating distribution. What it likely could introduce bias to, are statistics such as a players total playtime or what kind of heroes they play.