With this many sets, we can get a lot more interesting data about how matchups behave at different skill levels compared to 1835 sets like I did in the first version of this project.

Looking at every player against every player in true bellum omnium contra omnes fashion, we can see the following:

This is a zoomed out picture of what boils down to every reasonably important set I could find in 2017, and in a consistent fashion looks almost exactly identical to the same calculation I performed in 2015:

This is the part of the writeup I expected to examine the differences between characters on a macro scale, and explore the differences between how each character does on average by skill level. However, this ends up being an almost entirely meaningless exercise, since every top-tier is virtually identical when viewed from this super-zoomed-out lens. For example, here is the steepest winrate curve, followed by the most gentle curve:

Even Ice Climbers, who I expected to have a rougher correlation between skill and winrate, ended up pretty much the same as everyone else - it's likely that this is a byproduct of the tiers being much wider than what I used last time, but even so I expected a bit more variety between characters so this was pretty interesting to see.

One of the more popular aspects of the pilot version of this project was the correlation between skill and winrate. There are three calculations that interest me about these charts - Upset Potential, Volatility, and Correlation between winrate and skill. These can be calculated fairly easily: you can calculate upset chances by calculating the definite integral between the endpoints of the regression function and the +/- 0 skill difference (less than 0 = being upset, greater than 0 = upsetting others), you can calculate volatility by finding the derivative of the function at +/- 0 (flatter curves mean each "unit skill" affects winrate less dramatically), and you can calculate correlation between skill and winrate by finding the residual sum of squares of the data compared to the regression function.

The really exciting thing about this data is that the relationships between skill and winrate is much more clear compared to previous results; the tiers are a little wider and there's much more information, which led to pretty well-defined regression curves; likewise, this analysis provides almost no useful information whatsoever, which I find to be rather exciting. While there is some variation in the relative error of the regression functions, on a macro level almost every character performs virtually identically - you'll have a vaguely 14% chance to make upsets, you'll have a vaguely 14% chance to be upset yourself, and adjusting the two players' skill levels on average will affect the winrate virtually identically. Some characters exhibit a slightly greater chance to pull upsets or to upset others, but it's mostly attributable to noise, as no character moves more than three percent away from the average on either side except for Puff, who has a 4% (no pun intended) above average chance to pull upsets compared to the average character.

Looking at this information matchup by matchup provides some "real" information, some of which I will summarize below.

By far the least volatile matchup among the top tiers, in which the difference in skill plays the least factor in determining the outcome, is Peach vs Ice Climbers.

You could make the case from this that Peach vs Ice Climbers is the hardest matchup in the game - a matchup in which being dramatically better than your opponent could be virtually meaningless for predicting the outcome (what use is a 50/50 matchup against equally skilled opponents, if you're 5 tiers above your opponent and it is still 50/50?). Of course, the correlation is not very impressive (likely due to a lack of data), and like last time most of the matchup charts suffer from lack of data even with this many sets - certain characters are simply too rare. Among the more common characters, the relationships end up reasonably well-defined with some surprising curves.

These were the three that jumped out to me as reasonably well-formed, and they're pretty interesting to look at.

Sheik-Fox and Peach-Puff jumped out at me due to both their well-formedness and their relatively tame matchup ratios. Going by popular opinion, you might believe that these matchups are very lopsided, but in practice they both seem to be "losing, but not so badly". None of the points deviate super wildly from the regression function, and for the most part they just seem relatively straightforward. As I stated before, matchup ratios are not win percentages (especially due to gaps between the theoretical best play and the current best human play), but for players that performance in the real world to generate their matchup numbers it's good to keep in mind that even canonically "really bad matchups" are, by skill-normalized winrate, usually around 6-4 at worst.

Falco enjoys a small edge in winrate over Fox players, despite underperforming the regression line at roughly equal skill levels- a point where, keeping in mind, there is the most information. This highlights a relative weakness of using this kind of approach to "find winrates", in that the resulting polynomial weighs each aggregate set of matches at a specific skill differential as one data point even though they are based upon a different number of matches. You can imagine an alternative approach where you start with each point being equal to the global winrate curve, and every time somebody wins or loses at a specific skill difference you update that point's location using Bayes' Theorem, but that's a bit beyond the scope of this project at this time.

In the near future I'd like to do a similar analysis for only top 100 data by doing (((rankwin-1)-(ranklose-1))%10) on ssbmrank data instead of using these tiers - tafokints did a similar analysis on Fox vs Marth data which was great but ultimately fruitless at changing most people's opinions.