When I found out about FATA’s illness during ESL One Genting, it was the middle of the night in my timezone. I was still half asleep when the text arrived, so while still kind of dreaming, I thought about building a statistical simulation to numerically assess the risk of not qualifying to TI through DPC (Dota Pro Circuit) points. While at that point everyone in the team, including myself, was pretty confident about our qualification chances, I was already curious so I decided to do the simulation.

Mathematically speaking, to ensure qualification to TI, teams need 6037 DPC points. At that point, viewers might think that Team Secret needs roughly 1800 points more to qualify. To be absolutely sure, this is correct, but the statistical situation is a bit more nuanced than that. The probabilities of attending tournaments and probabilities of obtaining points are not uniform. I estimated these based on current DPC stats, explained in detail in Method section.

Normally, I’d first present the assumptions and method, but in this case I prefer to start with findings and results because TL:DR. People who are only interested in results can read them and move on.

Findings

Team Secret and Team Liquid has a qualification probability of over 99%. Since TL are already invited to TI, this point is moot but assuming they weren’t the reigning champions, they are already pretty much in.

In order for TS and TL to not qualify, the remaining DPC points need to be distributed among other top-10 teams, inversely proportional to their ranking. This is an extremely unlikely event, which is why so few of the simulations show failure to qualify through DPC for both those teams.

While the points mathematically required are ~6k, statistically speaking 4k points should be enough to qualify through DPC.

If we assume that teams outside the top-10 might be stronger than their results show, or new teams forming and challenging others; qualification probabilities for OG, EG and VG drop more than others. This is due to their position in DPC standings is not as secure as the other teams. Mineski seems to have a better chance at maintaining their position because of the number of teams in SEA. They have a higher chance of participating in tournaments so they have more opportunities to win. In my opinion, EG will get invited to more tournaments than my simulation shows due to their popularity and strength in NA, but obviously I can’t put such an assumption in the simulation so I treated them just like any other NA team.

Results

In order to perform simulation, there is a need for probability distributions for tournament participation and top-4 placement. To estimate these, I used the numbers in the stats page of DPC on Liquipedia. This page gives us information regarding tournament placements, qualification attempts and participation. I estimated probability distributions for tournament participation and placing top 4 in participated tournaments based on the information on this page. Initial probabilities had a lot of zeros because some teams haven’t participated yet and even more teams don’t have any results. In order to deal with this, I also estimated a smoothed probability. Details for this is in Method section.

To further assess the probabilities and remove any bias towards my team, I devised 6 conditions which are progressively more restrictive. These conditions are outlined as:

Using original participation and original placement proportions for simulations

Using smoothed participation and smoothed placement proportions for simulations

Using originals again but assuming Team Secret doesn’t get any points in the future for simulations

Using smoothed proportions but assuming Team Secret doesn’t get any points in the future for simulations

Using originals but assuming only top-9 DPC teams get points, excluding Team Secret for simulations

Using smooths but assuming only top-9 DPC teams get points, excluding Team Secret for simulations

For some or all of these conditions, following results are presented:

Probability of qualification of 31 teams

Distribution of DPC points collected by 31 teams in simulation by the end of the circuit

Distribution of overall qualified and non-qualified DPC points

Probabilities

The conditions for the first analysis were two-fold: Simulating probabilities of qualification for all teams using original and smoothed proportions. Notice I am referring them as proportions because they are not exactly probabilities, more like “pseudo-probabilities” through proportions. In the initial case, there is no restriction on which teams can get QPs, all teams can qualify for all upcoming tournaments and possibly place top-4.

Edit: Apologies, there seems to be a cap to view interactive plots, so I had to replace them with static ones. The interactive ones should be available here when the cap lifts.

Initially, this was the only analysis I planned on doing. But when Team Secret and Team Liquid came up as 100% qualified, I was a bit surprised and pretty convinced something was wrong. So I decided to further challenge these results by putting restrictions on it.

Another partially surprising outcome of smoothing was that it penalized teams that are in 4–9 positions far more than top 3 (in terms of qualification probability). I guess this was expected due to the discrepancy between existing points of top-3 and the rest of top-8 but I hadn’t thought of this initially.

To challenge these results, first restriction I put on the analysis was assuming Team Secret was not getting any points from now on. Remember, I initially started this because I wanted to assess the risk of not qualifying through DPC which is why I restricted only Team Secret and not Team Liquid. In this case, Secret can participate but they cannot obtain any points (i.e can’t place top-4). I felt like this was the more reasonable assumption since we are already attending Katowice and PGL Majors and hope to attend others.

The change was minimal, but we no longer had 100% qualified status, instead it was 99.94%. There is a slight increase in chances of 4–8th teams, this is expected, but the change doesn’t seem statistically significant.

So finally, to restrict the play field a bit more, I assumed that all points will be distributed among top-9 teams excluding Secret. At the time of this post, Secret has the 1st spot in DPC rankings so this means 2–9 ranking teams. In this case, Secret can’t attend the tournaments at all.

Again, slight change in our qualification probability, but not by much. As expected, positions of 2–8 are reinforced.

Point Distributions

I am also presenting the point distributions for relevant cases. However, due to scale issues, these box-plots show only top 10 teams. Also due to sheer amount of data points, interactivity was not possible so images will have to do.

Distribution with no restriction

Distribution with Secret Getting No Points in Upcoming Tournaments

The second plot shows Secret’s current number of points is higher than the majority of the points EG and Mineski might end up obtaining at the end of the pro circuit. This explains the behavior of probability distribution to qualify.

I’m not adding the case when only Top-9 teams except Secret get points because it’s essentially same as above.

Overall Point Distribution for Qualification

So far, one thing is clear from all of these results: Teams likely won’t need all of the 6000+ points to qualify (duh). In order to examine the amount of points that might be enough for qualification, I grouped the simulated points in terms of “Enough to Qualify” and “Not Enough to Qualify”. Keep in mind that there will be overlaps between these two categories because of different placements and participations but it should give us an idea about how many points are actually needed.

In this particular case I only used the unrestricted simulation results, simply because the “restriction” is actually already inherent in participation probabilities and top-4 placement probabilities. And so far, we haven’t seen any real difference between conditions.

Histograms of qualifying points with qualification outcome

It would have been great if I could embed an interactive plot in here, to show the “soft cut-off” point for qualification but apparently I have too many data points. This soft cut-off point appears to be around 4000 points. Whereas the “hard cut-off” a.k.a the mathematically required number of points is 6037. Soft cut-off is 66% of the hard cut-off. This might be the most interesting outcome of this analysis.

All in all, I was not really expecting such a drastic picture when I started this simulation even though I was pretty confident in our position. I am still inclined to think that, there are cases that could change the outlook of this analysis. I am not sure whether these will impact the outcome so drastically, but from a critical perspective this analysis does have some issues.

Fixed number of teams. Currently 31 teams have participated in DPC in one way or another. When new teams enter the arena, this picture will change. But I expect this picture to change in favor of high-ranking teams if new teams start taking points in the future simply because high-ranking teams have already secured so many points.

Sort of knowledge based probability distribution. As I mentioned, this is a double-edged sword. One one hand, we want our initialization to have some representation power, on the other hand inducing bias is really bad. I did my best to avoid the bias, I hope I was successful. There is a possibility changing the initial sampling distributions may lead to different results. Instead of using placements, using DPC points as basis for the sampling of placements might be an idea but this will favor high-ranking teams even more.

There is also the possible issue where sampling from so many candidates increases the variance of DPC point distribution a bit. For instance, while sampling for the 1st place, I might be oversampling the weaker teams and under-sampling the stronger ones. To be fair, that was sort of intended. I thought, exploring the possibility of lower ranking teams having more chance than the results said, had merit. But in the end, all conditions no matter how loose or restrictive ended up giving similar results.

Method

In order to simulate the qualification, we need to understand the process first. Every tournament has participating teams. These teams are either invited, or they qualify through open/main qualifiers. Every region has these qualifiers according to DPC rules given by Valve. This information tells us, we need to simulate participation first.

After participating teams are decided, teams that end up in top-4 obtain points based on the total points of the tournament and the proportions given by Valve in the rules. This information tells us we need to simulate the placements. Note that for points, we are only interested in top-4 so simulating these should be enough. One thing to keep in mind is that points are not allocated equally so the simulation needs to account for ranking.

So the outline of process is as follows for each tournament;

Simulate the participating teams

Simulate top-4 rankings of the participants

Some unplayed tournaments have already announced participants (Starladder, ESL One Katowice, Bucharest Major etc.) so they don’t need to be simulated. Some announced a few invites (Epicenter), so these need to be partially simulated.

At that point, we need a plausible initial “probability” distributions for participation and ranking. It is also wiser to consider minor participation and major participation separately. I didn’t see any need to separate minor and major ranking distribution because these rankings depend on the participation so if we do the participation distribution correctly, this information needs to be implicitly included.

So, the initial distributions… In every simulation this is probably the most difficult one. You don’t really want your bias(es) to be present in the initial distribution, but you also want it to be informative (at least more informative than uniform distribution). Luckily we have Liquipedia (shoutout to you guys) and they collect all sorts of information.

As an initial estimate, I used the numbers in the stats page of DPC on Liquipedia. This page gives us information regarding tournament placements, qualification attemps and participation. Out of these tables, I calculated the “top-4 placement proportion” ptop4 as follows,

where N is total number of tournament participations and topi is the number of i-th placements.

Participation proportion ppart is calculated as follows,

where MQ, OQ, INV and NQ are as defined in the Liquipedia stats page. I deliberately ignored the TBD as it introduces an unnecessary layer of complexity to the problem. Keep in mind that at this point we don’t actually have probability values but more like “pseudo-probability” values, proper probabilities will be introduced later by normalization.

It is obvious that this would lead to a lot of zeros for teams who have not participated or placed in top 4. This also introduces a “1” for Team Liquid because they have placed top 4 in every tournament they attended so far. Pretty impressive but poses a problem for the simulation.

Another problem of these values is, related to the previous one, they overvalue high-ranking teams and undervalue low ranking ones. In dota, we have seen too many underdog stories to know that any team can upset any other team, no matter how unlikely (R.I.P Kiev Major :( ).

In light of these issues, it would be better to smooth the proportions to have the following qualities:

No 0 and 1 values

Reduce the values for high ranking teams and increase them for low ranking ones.

This is very difficult to do by manually manipulating the numbers, so instead I used a smoothing function. Let p be the original proportion value, the smoothed proportion p_smooth is calculated as follows:

The 0.75 is the smoothing parameter while (1–0.75) is used as a multiplier to ensure power rankings from the original proportions are emphasized further. I tried different parameters, they all resulted in roughly the same thing so I settled on 25/75 because it looks nice.

So the original proportions and smoothed ones look like this:

Participation Probabilities