Parity

​

What does trying to count the types of trees in a forest have to do with competitive PUBG? This week, I’m looking at parity in the NPL.

Parity is the level of fairness or competitiveness in a sports league. If a league has high parity, teams have roughly equivalent levels of talent, with a small gap between the worst and best teams, and every team has a similar chance of winning. Last time, I went back over the standings throughout NPL Phase 2, where the end results were very close.

High parity isn’t necessarily good or bad – high parity means games are more competitive, and the winner can’t be easily predicted in advance, which can be more fun to watch. But in a league with high parity, like the NHL, a league with comparatively high parity among major North American sports, that means that luck, in the form of bad puck bounces or bizarre officiating, can have a lot of influence on a series – too much for some viewers. And in the NBA, a league with relatively low parity, the better team tends to win every playoffs series – which makes upsets all that much more exciting.

I’ve seen people saying that the second phase of the NPL was more equal than the first phase, with more of a chance for any given team to do well, because of the influx of top teams from NPL Contenders.

Is that true? Has parity increased from Phase 1 to Phase 2? Were the results really that much much closer this phase than last? And how strong is the competition in the NPL compared to other competitive PUBG leagues worldwide?

Testing these questions got away from me a bit because there are a lot of interesting analytical parallels to the way we evaluate biodiversity in ecology, my field of study. Let’s take a detour.

​

BIODIVERSITY

​

A black bear relaxing with a snow depth measurement stick, in a field of Labrador tea and trembling aspen, captured by a camera trap at one of my study sites in northern Alberta, Canada. Courtesy of the Alberta Biodiversity Monitoring Institute, 2018.

​

What is biodiversity and why do we care about it? Biodiversity is the variety of living organisms in a given area. That area can range from your backyard to the entire planet, and biodiversity can change from place to place based on environmental conditions like temperature. Globally, biodiversity is under threat due to human activities – I’m sure this isn’t news to anyone.

Ecologists are interested in biodiversity in order to answer questions like: what are the benefits of a biodiverse world for human society? Or, on a smaller scale, do food crops grow better when there are more species of pollinators around? We can answer these questions by comparing measurements of biodiversity under different conditions.

These questions about analysing the complexity of different systems are the same, mathematically, as questions about parity in an esports league.

There are a few methods ecologists use to assess biodiversity, but I’m going to focus on just two, which are also commonly used in economics and sports analytics, to find out if NPL Phase 2 had higher parity than Phase 1.

First, I’m going to visualize parity in both phases so far using Lorenz curves.

​

LOOKING AT PARITY IN THE NPL

​

A Lorenz curve is a graphical representation of inequality, developed in 1905 by American economist Max Lorenz to illustrate unequal distributions of wealth in a country.

It has a lot of uses in ecology, but for PUBG, it can be used to show the difference between the actual distribution of points (the curve), and what the points distribution would be if all teams were exactly equal and earned the same number of points (the dotted line).

​

Lorenz curves for NPL Phase 1 and Phase 2.

​

What does this graph actually show? The horizontal x-axis is the proportion of teams as each team is cumulatively added into the analysis – the curve goes through each team one at a time from left to right. The vertical y-axis shows what we’re interested in – the points earned by each team out of all standings points earned by all teams, accumulated as it goes through each team from bottom to top.

For example, the point at x ≈ 0.63 and y ≈ 0.5 for Phase 1, in purple, tells us that 63% of the teams in the league earned only half of the total standings points. Under perfect parity (along the dotted line), where each team performed equally well, those bottom 63% of teams would together have earned 63% of the points.

There isn’t a huge change between Phase 1 and Phase 2, but the curves are different, with Phase 2 showing slightly more parity. The Gini coefficient, or the area between the curve and the dotted line, is 0.159 for Phase 1, and 0.122 for Phase 2 (smaller number = more parity).

Unfortunately, I don’t know whether or not this difference is significant. I started to look at other methods of definitively answering this question.

The standard method for evaluating parity in sports economics journals is the “relative standard deviation” (not to be confused with the concept in statistics with the same name), a measure of how different the actual results in a league were from the hypothetical results under a scenario where that league has complete parity (Cain & Haddock 2006, Trandel & Maxcy 2011, Owen 2012, Lopez 2017).

This calculation is straightforward enough in a sport like baseball, where a team earns one point if they win and no points if they lose, and each team’s likelihood of winning under complete parity is 50%. It’s a bit more difficult for a sport like hockey, where a team can earn 2, 1, or 0 standings points each game if they win, win in overtime, or lose – and I don’t know how I would even begin calculating it for PUBG, where the points system includes a lot of events that happen within the game, beyond just who wins. I decided not to go that route.

​

PARITY AND BIODIVERSITY – WORKING WITH WHAT I KNOW

​

There are some equations coming up, but if that’s not your thing, you can just skim over them and go two sections ahead to see the results!

Another option for assessing parity that came up in my research was the Herfindahl–Hirschman index, a measure of market concentration in economics. I looked into this a bit more, because it seemed familiar.

As it turns out, it’s the same formula as something I know well: a measure of biodiversity known as Simpson’s index, or the Simpson concentration index, developed by Edward Simpson in 1949. The Herfindahl–Hirschman index was independently discovered a year after Simpson’s work.

Out of all the points earned throughout an entire NPL phase by all teams, if you take two points at random – let’s say one is a kill point from match 5, and one is a placement point from match 21 – Simpson’s index, \(\lambda\), is the probability that these points were earned by the same team. Or, if you’re looking at a patch of boreal forest, Simpson’s index is the probability that two randomly selected trees belong to the same species.

\[\lambda = \sum_{i=1}^S p_i^2\]

To calculate \(\lambda\), you first count the number of individuals of each species, or points earned per team, as well as the total number of species or teams. \(p_i\) is the proportion of individuals or points belonging to the \(i\)th species or team, out of all individuals or all points earned by all teams.

Simpson’s index is useful, but it’s tough to compare between multiple measures of this index. If I’m trying to see if parity increased in the NPL from one phase to the next, what does the difference between a Simpson’s index of 0.932 in Phase 1 and 0.935 in Phase 2 mean in terms of the actual teams and their competition?

The index doesn’t scale linearly with increasing diversity or parity, and it’s not intuitive to interpret. At high levels of this index close to 1, small changes can represent drastic differences in parity – or not.

​

A BETTER MEASURE OF PARITY

​

The majority of metrics in ecology for describing biodiversity, like Simpson’s index, are based on the concept of entropy in information theory. Entropy is the disorder in a system, the degree of uncertainty associated with predicting bits of information – like determining whether individuals drawn from a community are the same or different species.

To address the issues of non-linearity and interpretability with these entropy-based indices, Lou Jost (2006) (PDF) proposed that they should be converted into a “true” measure of diversity called the “effective number of species” (MacArthur 1965), also known as a Hill number (Hill 1973), which is the number of equally abundant species necessary to produce the observed value of diversity.

To put this another way, this conversion to effective species generates the hypothetical community where each species has the same number of individuals, instead of some rare and some common species, that has an equivalent diversity index to the actual community you’ve observed – or the hypothetical league that has the same level of parity as the actual league, but where all teams earn the exact same number of points – and counts the number of species or teams in that hypothetical group.

This measure scales linearly, so if the community is twice as diverse when you come back and sample it again, the effective number of species is also doubled. It makes it easier to meaningfully compare two measurements.

All the conversions of entropy-based metrics to “true” diversity are just special cases of the same overall formula, because these metrics are all related to each other.

This expression for this “true” diversity is:

\[^qD = \left( \sum_{i=1}^S p_i^q \right) ^{\frac{1}{(1 - q)}}\]

where \(^qD\) is the effective number of species, and \(q\) is the “order” of the equation. When \(q = 0\), the result is just the number of different species that you counted, with no information about their relative rarity.

Simpson’s index can be converted to an effective number of species in this way. The inverse Simpson’s index is also used as a measure of diversity, and is also referred to as Simpson’s index, because we like things to be clear and easy to understand in ecology.

The inverse Simpson’s index, \(\frac{1}{\lambda}\), happens to also be \(^2D\), the second-order version of this generalized equation for converting to effective species numbers, where \(q = 2\).

\[^2D = \frac{1}{{\sum_{i=1}^S p_i^q}}\]

When you calculate \(^2D\) for the NPL, what you get is the effective number of teams in each phase – the number of equally successful teams that would give the same value of parity that was actually observed in each phase.**

​

DID PHASE 2 HAVE HIGHER PARITY?

​

So, after all this theoretical work, what’s the actual answer to my question? Let’s look at the effective number of teams in each phase.

I calculated the effective number of teams in each phase with the inverse Simpson’s index, and bootstrapped confidence intervals for these measurements to compare them.

​

Parity in the NPL. Inverse Simpson’s indices, with confidence intervals, for the NPL Phase 1 and Phase 2, showing the effective number of teams in each phase.

​

Out of a lobby with 16 teams, the effective number of teams in Phase 1 was 14.802, 95% CI = [14.487, 15.004], and 15.285 in Phase 2, 95% CI = [15.028, 15.42].

The values can be interpreted as the number of “typical” teams in the league. If the effective number of teams were 1 or 2, that would mean only a few teams monopolized the vast majority the points; if it were 16, that would be perfect parity.

Parity did increase from Phase 1 to Phase 2, as expected. You can see that the confidence intervals – just barely – don’t overlap, meaning that there is a significant difference between the parity of the two phases, but the size of the effect is fairly small.

How small?

​

NPL PARITY IN CONTEXT

​

To put this difference in parity in perspective, I created a dataset of all the standings points from every major international competitive PUBG league with a 16-team lobby*, and plotted their parity values to show the overall spread in parity.

​

Parity in competitive PUBG. Inverse Simpson’s indices, with confidence intervals, for each phase of every major international league with a 16-team lobby, showing the effective number of teams in each league. Numbers on the ends of league acronyms refer to the phase.

​

Phase 2 of the NPL actually had the highest parity of all leagues so far, with Phase 1 not far behind.

PEL also has high parity – note that PEL Phase 2 is not finished yet. I’m curious to see how PEL Phase 2 compares to Phase 1 – it shouldn’t be much different, because there’s been no change in the lobby between phases yet.

The league and phase with the lowest parity is Phase 1 of the ESL LA League, in Latin America. LA increased in parity from Phase 1 to 2, but other than that and in the NPL there have been no major changes in parity between phases in other leagues.

I’d be interested to look into the relationship between parity and the number of matches played.

​

NPL LORENZ CURVES IN CONTEXT

​

I added LA Phase 1 to the Lorenz curve plot for comparison, so you can see where the NPL phases fit in to the overall variability in parity.

​

Lorenz curves for NPL Phase 1 and Phase 2, and ESL LA League Phase 1.

​

This post was a bit dense, but I hope I explained things well enough to at least get across why I find everything here so interesting!

My .Rmd file is here.

​

tl;dr I looked at parity, with a tangent about ecology. NPL Phase 2 has a more competitive lobby than Phase 1, but not by much. NPL Phase 2 has the highest parity so far out of all competitive PUBG leagues internationally.

​

* I couldn’t use leagues or tournaments with a round-robin format, like the PKL or PCL, for the purposes of this comparison.

** Any ecologists reading this might ask why I didn’t use the Shannon index, with \(q = 1\), which less severely discounts rare species than the Simpson index. The Shannon index is said to better balance how it weighs rare and abundant species. The answer is: it’s a lot harder to bootstrap confidence intervals for the converted Shannon index, and I couldn’t find an R package that someone else had already written to do it for me :)