Community Info: Issues & Moving Forward

February 24, 2016

Please note this was originally intended to be posted early January as a collaboration between Callimonk and myself, but due to a bunch of different reasons it’s been delayed until now. Calli will post her parts on her own blog soon

In December last year there were two posts by Zooper on Blizzardwatch regarding what is commonly referred to as “Cookie Cutter” builds about how there is generally a single accepted talent build, and player attitudes based around this knowledge. These posts are a small part of what our posts will be addressing, but also somewhat of a starting point for the rest of our points as well.

It’s worth pointing out that there are some generalizations made in Zoopers posts, partially around how large the “problem” of binary answers in guides actually are. In a quick survey of 6 Fury Warrior guides 3 did show a binary answer (MMOC, Noxxic and Wowhead) while the Icy Veins & WoW Forum guides explained that rage capping was an issue behind Furious Strikes, and lastly Methods guide indicated that Furious Strikes is “slightly weaker” than Sudden Death. This means 50% of guides had a binary or non-explanatory answer, which could be reduced to 33% if you consider the fact that the Wowhead guides are intended to be for beginners.

I’d also argue that Zooper mis-represented the quoted guide statement in her survey, as the WoW Forum guide was attributed as saying

Sudden Death is pretty much the only option for Fury, damage wise

Whereas the guide has a longer statement on the subject including explaining why Sudden Death is a better option.

Level 45: Sudden Death, pretty much the only option for Fury damage wise now given the fact that using Furious Strikes will force you to cap rage and have extreme difficulties in not wasting your rage and slightly diminishes the effect of Anger management.

It’s possible to argue that had the full quote been included rather than a shorten absolute the survey responses may have ended up somewhat differently, with probably a much higher number of No responses to the first Guide statement question. This would have resulted in a Quantitative statement (Data), Qualitative statement (Guide) and the somewhat arbitrary “Top Player” statement as three different approaches to the same question.

The other problem is that the Data statement was “according to simulations” which are open to misinterpretation as it depends on the fight style, settings, talent combinations and even gear as to the result that is returned.

Current Issues

Min/Maxing

Part of the problem here is that min/maxing is not only expected, but it’s become more extreme over the last few expansions. The 80/20 rule states that 80% of the result comes from 20% of the effort, and vice versa, and that certainly applies here. Many forum threads are devoted to working out and arguing over which combination of items results in the ultimate “best in slot” list, but in the majority of cases knowing item levels and stat priorities are sufficient.

For example, let’s look at the neck & waist slots for Elemental in Hellfire Citadel. As some background, the stat priority for Elemental is Multistrike > Haste > Crit > Versatility > Mastery.

Choker of Forbidden Indulgence is high Haste, low Multistrike, versus Vial of Immiscible Liquid with high Mastery & low Crit. Vial also has 5 extra ilvls but because it has a lot of Mastery this is enough to make the Choker the better choice. Cursed Demonchain Belt (high Haste/mid Multistrike) versus Girdle of the Legion General (high Mastery, low Haste, +5 item levels) is a similar case, where high Mastery counts against the higher item level.

Even though you want the absolute best items, It’s important to remember that it will take a long time to get them, due to random drops, bonus effects and gear competition within a raid team. Don’t focus so much on what the best setup is but rather whether items that have dropped are upgrades for you

The difference between the absolute “best” and “worst” items in a single slot is actually very small in the overall scheme of things. Using myself as an example on AskMrRobot we can compare my 730 Warforged Socketed Finkle’s Flenser with a Mythic Gavel of the Eredar (ilvl 735). Upgrading will result in a 100 point gain (~3.88%) which seems pretty big, but when you compare it to my overall score of 13,972 it becomes a 0.718% overall upgrade, and this is mostly due to the ilvl difference.

This can also be applied to racial differences as the difference between the “best” and “worst” passives can also be well below 1%. Yes every upgrade helps, but this comes back to the point where if you need a personal <1% dps upgrade in order to kill the boss (~0.0667% raid DPS) then staying alive and executing your rotation or boss mechanics better is more likely to result in a kill.

Gear Optimisation & the Changing Stat Weight Cycle

One of the flaws with “best in slot” gear lists is that stat weights are often mis-used. Stat weights are the result of increasing a stat by a specific amount, calculating the DPS gain, then returning it to a gain per 1 quantity value (The default in SimulationCraft is 183 for most stats). This means weights show the per stat gain for the currently equipped set, ie: where to go from Point B rather than how to get from A to B.

Looking at the modified stat weights from my Elemental Shaman guide we can a few changes in both absolute priorities and relative values. The two biggest are that from Pre-Raid to T17 Mastery and Versatility swap their absolute order, and from T17 to T18 the relative difference between Haste and Crit goes from being roughly equal to having Haste distinctly higher. Some of these changes will be to do with how set bonuses work (Multistrike in T17, Haste in T18), but even without them it would be unwise to plan a pre-raid gearset around T18 weights, or vice versa. In other words understanding the context of stat weights is important.

When players start blindly following priorities without taking this into account it can result in major priority shifts even though the actual values change only slightly. In the above example T17 values Crit (4.56) and Haste (4.76) fairly closely. By following a strict Multistrike > Haste > Crit priority it is possible to switch the values for these two stats resulting in a Multistrike > Crit > Haste order instead, because having more haste makes crit more valuable and vice versa.

It is also likely that the weights for a single target sim will be vastly different from those for multi-target or heavy movement sims, so relying on one set can not only lead to back & forths with gear but also give less than ideal gear sets for different encounter types. This is why any stat weights I list on my guides will be modified to consider multiple environments as well as different talent combinations, so that these values can be used more as the guide post to gearing up that players expect.

Creating massaged weight sets involves running multiple sims for various conditions, and then making judgement calls about what data to include or exclude. An example of this is my Elemental T18 stat weights sheet where I used two talent combinations in 3 different encounter types to look at the overall weights and then evaluated the weight sets in relative values (using both Int and Multistrike as the normalising point).

Ideal specs/classes

Everyone wants to be the best they can, and will look at how to get there. Seeing low results on logging sites or simulations can cause people to switch to higher performing specs, which results in the top end guilds skew their raid teams and shifts the skill distribution behind classes and specs. This in turns leads to other skilled players switching, further changing the distribution of skill sets behind specs and repeating down the progression chain until people stop caring about this sort of thing.

If we look at Mythic Archimonde parses on WarcraftLogs we can see there are 17,514 entries for DPS specs, and 50.8% are for 4 specs (Marksmanship 17.80%, Arcane 12.77%, Subtlety 10.5% and Balance 9.71%). Players often look at a breakdown like this and see it as a sign that specs not favoured by the very top end guilds are inherently “bad” without considering the extreme approach to min/maxing they use. An example of this would be Arold of Limit, who is considered to be one of the top Elemental PvE players, but he’s only used his Mage in Mythic Hellfire Citadel rather than his Shaman who has no recorded Mythic kills. This means log results show one less skilled player on an Elemental Shaman and one more on an Arcane Mage. Repeat this a few hundred times and the result is a distorted skill base for most specs, and it’s virtually impossible to account for that.

It is possible to look at percentile results like 50th or 75th, but even then if 20% of the top players for a spec have changed to another the players who would be 50th percentile for one spec are now around 63rd, whereas the 50th percentile players for the other are down to 42nd. Distortions like this then feed back into the community perception of how good or bad specs are, which leads to more players changing spec or class in the name of min/maxing optimal raid compositions, and ends up putting more positive or negative pressure on the percentile results in turn.

Bad Guides

A good example of a Min/Max focus where taking simulation data and extrapolating it to a binary guide is Noxxic. The primary reason given for many talent selects are “X is a DPS-enhancing talent and is recommended because it yields the highest DPS of any talent in this tier.” or “Y is a DPS-enhancing talent and was not selected as another talent produces higher DPS in this tier.” without any context behind it. Their talent pages refer to a talent DPS calculator to see the differences in output based on what appear to be batch run simulations, but there is no context to any recommendation aside from “more DPS”.

Noxxic is also a good example of giving too much data, as seen with the various stat weights shown for different gear levels. In most cases stat priorities won’t change from one raid gear level to the next, with the exceptions usually being the results of set bonus effects. This isn’t to say that Noxxic’s guides can’t be good, but they focus too much on raw data from simulations without context to convey important information to players. The same thing applies to many forum based guides, where often the low standard required for approval is that someone volunteers to write it for this particular section of the community.

Player Expectations

In many guides there are binary statements around which talents to use, or not use, which can be explained for two reasons: either the writer is a very min/max focused individual, or there has been external pressure from min/max focused individuals to “correct” the guide to follow what they expect. Zooper even mentions this in her first post:

It’s a hard line to walk: if we don’t give in to popular demand, ‘everyone’ runs around saying we are wrong. If we give in (which we do for a small portion of requests), we become part of the problem.

There is a difference between “Min/Max Focused Individuals” and “Theorycrafters” however, and ultimately I don’t believe that the responsibility for solving this problem relies with either Theorycrafters or even the Developers. Even when providing additional context and explaining that multiple talents are valid choices for each tier (as well as why), there were still requests on my Elemental guide for the “best” talent combination, and that it had to include Unleashed Fury because it “gave the best DPS” even though I had a few qualitative reasons against taking it and once simulations started including AoE or movement the small variation it was better by either diminished or it was eclipsed entirely by other talents.

In other words, if Theorycrafters manage to write absolutely perfect guides that explain everything including damage variance and context, players will still demand the “best” talent build aka the “cookie cutter” build.

Moving Forward

Changing Evaluation Methods

There are two important areas that need development within the wider community to present better information and more context: quantifying differences between talents & trinkets, and presenting stat weight data in a way that helps guide players to a good gear set without needing to generate ideal gear combinations.

For stat weights one of my ideas recently has been to instead present rough secondary stat ratios to follow, so that players can easily see that they have too much haste or too little crit relative to their current gear rather than worrying about making sure that they have a perfect item combination. This is especially important as an analysis of Hellfire Citadel gear for Elemental revealed that while certain items can be considered “the best”, there are anywhere between 6 to 12 other items which can be nearly as good, if not better, depending on bonuses like warforging & sockets. It’s even possible for Heroic Socketed items to be better than Mythic items.

One possible idea would be to start out with a fixed primary stat & run through a process of taking 50 Crit/Haste/Mastery, adding another 50 to one stat at a time to work out which returns the most damage, and then using this highest result as the basis for another repetition of the process. Repeat this for various primary stat levels and you can build a database of the relative values of secondary stats & how they change with different primary stat quantities. Throw talents into the mix and you can provide some interesting and fairly easy to use data to end users on how their gearing is going.

Analysis approaches like this would give us a better understanding of how the value of stats change with respect to each other and in a format that lets players of all skill levels identify where they could improve without addons to evaluate gear. It could also be possible to derive stat weights for use in addons like Pawn from this data, or even relationship formulae to drive even better addons without the need for massive background data tables.

Variation Tolerance

Often with talent analysis the differences are calculated against the difference between the absolute best talent combination and no talents at all. This results in one talent potentially doing 80% of the damage of another, but if the talent tier only contributes 5-6% of the total damage the variation comes down to a much smaller 1-1.2%. In other words, a 1000 DPS difference may appear big, but in the context of doing over 100,000 DPS it’s actually fairly small. Because of this I would argue that variations between talents & trinkets be viewed in the difference made to overall output, rather than narrowing the viewpoint down to the contribution solely attributable to the talent or trinket.

The other problem is that while working out optimal rotations or combinations in SimCraft is good, sims aren’t a good representation of in-game realities. Differences of less than 5% in sims could be drastically reduced or even inverted depending on how far a particular boss fight differs from the closest approximate sim model.

If the community accepted a larger tolerance for variation the negative responses highlighted in Zoopers survey would hopefully diminish. With the intended design approach in Legion being opt-in complexity via talents this may be necessary as a less complicated rotation may end up returning better results for some than a more complicated one, even though the more complicated rotation is the “best” one.

In Summary

While Zoopers initial posts may have some flaws, she raised a good discussion topic which the community as a whole should learn from. A more open approach to customisation choices other players make could reduce hostility & make the game a more accepting environment, which could do more to help the game in the long run than any design change.

Nothing is ever perfect and we should continually ask ourselves where we can do better, not just in terms of absolute numbers but also with how we share information or interact with other players.