Last updated on August 5, 2020

By Arseny Khakhalin, Dan Troha, and Bobby Mills

Believe it or not, we’re already four weeks deep into the Guilds of Ravnica draft environment! We’ve had a couple of Grand Prix tournaments, tons of drafting on Arena and Magic Online, and of course there’s still a Pro Tour coming up in a couple weeks.

Much has been made around the internet about ranking the various guilds and colors in GRN. We’ve seen plenty of people say that “green is bad” and have seen others respond with “you just don’t know how to draft green.” So it’s fair to say there’s been a bit of controversy about how to approach the format.

And that’s what we’re going to get into today. Not only will you get some great new visualizations of the overall Guilds draft format, we’re going to dive deep into a new draft statistic we’ve developed: the controversy of a card among drafters.

For today’s article, we took our data from around 118,000 drafts on Draftsim – collected throughout the first week of October. As before, we anonymously saved the cards and the order in which you picked them given the other cards presented to you in the pack.

Synergies

Let’s start with our now-familiar visualization of card synergies. In case you missed our description of how we previously created them, check out this summary. In short, each card on this plot is represented by a point matching its color. The distances between the points reflect how frequently any two cards are drafted together; the more often they end up in the same deck, the closer they are shown on the plot.

The axes here aren’t labeled because the values they represent are not really interpretable. For this type of analysis, only the relative position of different points matters. Colorless cards and nonbasic lands are shown in gray, and multicolored cards are all shown in purple.

First up – drumroll please… the synergy plot for GRN:

Isn’t it lovely? For you to appreciate just how different GRN is from a “normal set,” recall the synergy footprints for all four sets we’ve analyzed so far:

While DOM and M19 were more or less “normal” sets, and RIX/XLN was clearly a tribal block, GRN is built around guilds. What’s the difference between a tribal block and a guild one? What makes the synergy landscape so different?

Two things probably: one is that in GRN an awesome bomb card does not lock you into a guild that firmly. As long as your mana base supports the awesome card, you can still experiment. An Aurelia, Exemplar of Justice you drafted will still mentor cats, if necessary. It doesn’t force you to only draft angels, or mentor cards, as was the case with Ixalan. The second important difference is that two of the tribes in XLN spanned three different colors, while the guilds are all nice and evenly spread, like a flower.

With its symmetrical shape and tight card clusters, almost no single card breaks formation. Rare exceptions are the two somewhat lonely red points at the very bottom that are biased towards Izzet. These are, of course, Electrostatic Field and Erratic Cyclops — cards that strongly benefit from being in a spells-based blue/red deck.

And you can see a black card on the very right that is attracted to Dimir: it’s Whispering Snitch, with its surveil synergy.

It really is astounding to look at the way single-colored cards sort themselves out toward their stronger color pairing. In blue, Leapfrog and Maximize Altitude pull towards Izzet while Thoughtbound Phantasm and Enhanced Surveillance push up against the Dimir pile.

You can see a large 2000×1500 px file with all cards labeled here:

And we’ve also got an interactive visualization you can see here:

Color Pairs

We also looked at the color pairs that were drafted most often, as we did for M19 in our last post. Remember that for each final pool of cards drafted, we determined the color that was drafted the most (“Color 1”), and at the next most abundant color (“Color 2”). Then we built a heatmap, showing which combinations were drafted with the highest frequency and which were the least favored.

The plot for GRN looks pretty strange, but the sparseness is by design. Nobody is trying to draft Simic this time around, so only some color pairs in the heatmap table are even shaded:

What do we see here? Dimir (and more specifically, Dimir heavily biased towards blue) is by far the most popular. Note how strong the asymmetry is: 20% of decks feature lots of blue spells and some black, but only 7%, almost 3 times fewer, feature lots of black and some blue. Perhaps this is because both blue guilds are considered to be “good,” while one of the black guilds (Golgari) has been widely criticized for being underpowered. So people hedge by starting with being blue-based.

The next most popular guild is Boros (red-skewed), then Izzet (blue-skewed), and Selesnya (green-skewed). Golgari, on the other hand, though it may be strong in constructed, came in last place in our popularity contest.

Interestingly, it took our drafters some time to come to terms with the fact that green is a tough color to draft in this environment. If we look at how the popularity of different colors changed during the first several thousand drafts, we see that green was becoming ever slightly less popular, while black grew on people. The effect is weak, but it is definitely there.

Controversial Cards

We also had a new idea this time: to look for cards that are “controversial” in terms of their relative rank and rating. The intuition is simple: obviously, there are cards which everyone agrees are great and are always drafted first. If you open Dream Eater, you obviously take it, it’s a no-brainer, and it will probably strongly influence you to draft blue for the rest of the draft.

On the flipside, there are cards that are obviously “bad” that no one wants to take. These are the chaff, the very last cards in a booster. There are also cards in the middle that are sort of OK, and which everybody agrees are just OK. Think Devkarin Dissident, for example.

But we thought that there could be other cards that are drafted early by some people and hated by others, thus dividing the drafting community. And wouldn’t it be fun to identify them?

Now, while the idea is simple, the analysis turned out to be unexpectedly hard and weird. But if you like stats and data science, bear with us.

Calculating pick variance

To find controversial cards, we looked at the average pick number within the draft at which each card was taken, the same way we did to compare card rankings for previous sets.

But this time, we also calculated the variance of when each card was taken. For example, if a card is always picked first, its mean pick order would be 1, and the variance would be 0. If the card is picked 2nd in 50% of cases, and 3rd in 50% of cases, the average pick order would be 2.5, and the variance, for large sample sizes, would approach 0.25 (var([2 3 2 3…]) ~ (0.5)^2). We calculated these means and variances for every card, and plotted the variances against the mean, which gave us this graph:

As you can see from the legend, the color scheme is new here: black is for commons, gray for uncommons, gold for rares, and red for mythics.

The cards in the bottom left corner are the awesome mythics and rares, like Doom Whisperer, Dream Eater, and Aurelia, Exemplar of Justice. Everybody likes them, everybody takes them the moment they see them, so they have an average pick order of 1, and a variance of 0 (everybody agrees that they are the first cards to be picked).

On the other end of the spectrum there are points on the very right that represent universally unloved commons: Pause for Reflection, Vicious Rumors, and all five lockets. These cards are always picked last — or almost last — and people more or less agree that it is appropriate to pick them near-last.

Between these two extremes, we have a nice gentle curve. Cards in the middle have higher variance — people feel less strongly about drafting them earlier or later. They are more situational and are also dependent on the guild or deck type the drafter is shooting for. As a result, the more “average” the card is, the higher variance of drafting order it has. Which gives you this pretty hill shape, with the highest variance around the middle.

But then you can notice that there is some extra structure to the hill-shaped plot. Notice for example that commons form a band, uncommons sit on top of this band, and rares are pushed to the upper edge of the uncommons. Why is that?

Before we discuss which cards caused variance among humans, we first need to see where variance occurs naturally.

Inherent draft variance

At first we thought that it may be an artifact of human psychology, but then realized that the stratification appears in this plot even if we analyze decks drafted by Draftsim bots (below). That means that it has something to do with a card’s power, color, and its relative frequency, as these are the only three things our bots “know.”

So, where do these bands come from? They seem to be a byproduct of the drafting dynamics themselves. Imagine a card that is reasonably good, but narrow and demanding: something like a multicolored rare, or the very mono-colored card Gigantosaurus from M19. A player who opens this card in a booster would rarely pick it first, as it’s not good enough to commit to a guild or color. Such a card would get passed around the table, until it hits a player who is already drafting this color or guild anyway, at which point they’ll probably take it.

For these narrow, situational cards, that are decent in some decks, but horrible in others, a mediocre average score comes from a mix of early on-guild (or on-color) and late off-guild (or off-color) picks. These are “cards of opportunity”, that either ask a lot out of your deck (like Thousand-Year Storm), or just have a very limited application (like Drowned Secrets). As an example, here’s a histogram of pick frequency for one of the most controversial cards in GRN, Ionize:

Here’s a practical scenario for you that is hopefully easy to relate to. Imagine that 8 people sit at a table to draft. They open 8 boosters, and one of the boosters has Ionize (a high-variance card) as the pack’s best card in Izzet. The card’s two-colored casting cost (and solid-but-not-amazing power level) makes it not good enough to commit to Izzet right away. There are almost certainly more flexible or powerful cards in the booster, so the player that opens it will probably pass it to the next person to the left.

That person will look at the Ionize, and if they are drafting Izzet, they will be happy to take it. But if they are drafting anything else, they will pass it along, because this card has a low chance of fitting their deck. So it will move across the table until it finds a player drafting Izzet, or until it is drafted as an off-color card towards the end of the booster, because no better options are available.

There is almost no competition for these cards, which leads to this flat histogram, and to a cliff-like decrease in probability of being drafted between the 8th pick (almost full cycle) and the 9th pick (beginning of the next cycle). For a card like that to be drafted on the second cycle, someone would have to pass it the first time around, but later firmly settle on Izzet by the time the booster goes around the second time; something that does not happen that often.

For bears, decent commons, and other “so-so” cards, the situation is very different: you never draft them first (they are not good enough for that), and you almost never get to draft them last, as they fit in many decks, and somebody will almost certainly draft them at some point. Which means that if they have an average score, that’s because they are actually average!

These cards are competitively drafted by many players who compare them to other on-color cards in each booster. As the outcome of these comparisons varies from draft to draft, the histogram of pick orders looks vaguely normal-ish (the Central Limit Theorem starts to kick in), centered somewhere in the middle of the rank scale.

For example, here’s a histogram for one of the least controversial cards with about the same average pick order as Ionize (around 7th), Fresh-Faced Recruit:

So Wizards R&D prints more situational and complex cards at rare or uncommon, creating these bands of gray and gold above the curve of black (commons).

This is our best explanation for the fancy Var/Mean curve, and if you have a better one, please let us know in the comments! The effect is real, and has nothing to do with psychology, but it is also tricky, and hard to wrap your mind around.

OK – here’s the list

But enough with all this talk; what are the most controversial cards? Taking into account the baked-in draft dynamics we just described, the list of controversial cards consists of medium power mythics and rares, mostly two-color, that are situationally good. With raw pick variances, the cards score high on this “controversy” scale not because they polarize drafters necessarily, but because of the logic of the drafting “game.” For GRN, the most “high-variance” or “volatile” cards are:

The least controversial cards, on this scale, are:

Human-induced variance

Unsatisfied with this scale, we kept thinking: can we still somehow look past the drafting mechanics and quantify the uniquely human contribution to variance?

How do you find results that do not stem from permutations and distributions, but that are driven by psychology and differences in human opinion?

We realized that we can actually estimate that, because we track both human drafts and bot drafts! All effects of stats, distributions, and relative scores should be the same for bots and humans (except that the ratings used by the bots, and average opinions used by human players, would of course be slightly different).

But humans would have some additional personal variability added to their actions. But bots always use the same ranking system, carefully assigned by Dan Troha, so we can estimate human volatility by subtracting bot pick variability from human pick variability! Here’s the result of this analysis for GRN:

On this plot, the horizontal axis shows how much earlier humans drafted each card, compared to bots. If on average humans drafted something 2 picks earlier, it would have an X of 2 here. The vertical axis shows just how less certain humans were about this card, in terms of bringing extra variance to their pick order.

For example, judging from the yellow cloud on the right side of this square, humans tend to rare-draft (humans pick junk rares and mythics 2-3 picks earlier than we recommend our bot do it). They are also more variable in these preferences, probably because some users raredraft and some don’t. In fact, there’s been a long-running debate among Draftsim users about whether the bots should raredraft because many people do in “real life” scenarios.

And so, finally, here are the most controversial cards, the points in the right top quadrant:

As you can see, most cards that are hard to incorporate in a deck because they are situational and narrow (as we described above) were also confusing for human players. In other words, these are the weird rares.

Which, in hindsight, makes lots of sense. When we are faced with tough decisions, we each solve them differently. That’s what is nice about humans: some like to take risks, some are careful; some stick to strategies that worked in the past, some like to experiment.

So when cards were hard to fit in a deck, this “volatility” was further amplified by human psychology, making them “controversial” not just in a mechanical sense, but also in the psychological sense of this word.

We then looked at the most controversial commons:

Now, this is interesting! In raw, absolute numbers, the five lockets were relatively less controversial than most other commons (remember, they all sit in the right bottom corner of the inverted parabola). Yet if you look past the systemic variance into human-specific variance, you actually see thathuman players had different strategies here. Some were drafting lockets, and some were not!

People love to experiment when they use a draft simulator, and drafting a “sweet” four or five-color locket deck is a nice thing to try out. People were also fairly uncertain about where the lockets fell as far as power level at the beginning of the format.

For the sake of completeness, here are the most controversial uncommons:

A nerdy footnote on mathematical paradoxes

Now, here is an even more mathy aside, for those of you who like to read Frank Karsten and Felix Weidemann for fun. Did you notice that the plot of human-controversial cards actually looks a bit weird, as humans seem to draft virtually all cards earlier than bots do? Look at it again: most of the points have positive X values, meaning that they are drafted by humans a few picks earlier (anything between a fraction of a pick, to 5 picks for Divine Visitation).

When we saw this plot for the first time, we thought that it may be a bug in the code, because how could it possibly be that humans would draft all cards earlier? Surely if they draft some cards earlier, they would draft some other cards later, right? The average values should be the same for humans and bots, as they are all in the same boat, subject to the same rules, wouldn’t you agree?

It turns out that it is actually not a bug, but rather a real full-blown paradox, directly related to Simpson’s Paradox (check out the UC Berkeley admissions controversy, which is closely related to our problem). It turns out that it is possible for one player to draft all or almost all cards earlier, on average, than another player. It is actually a sign of inefficiency. Humans are inefficient when compared to bots — but we knew that!

If you find this counterintuitive and hard to believe, here’s a small trinket example. Say two players are drafting from boosters of 4 cards, and there are only two types of cards, there, A and B. A is more powerful than B. They draft 2 times, and get the following final draft pools:

Draft 1: Player 1: AB, Player 2: AB

Player 1: AB, Player 2: AB Draft 2: Player 1 BB, Player 2: AA

Let’s calculate average pick order for both cards and both players. For Player 1 here, the pick order of A is 1 (it was always drafted first). The pick order of B for the first player is (2+1+2)/3 = 1.7. For the second player, the average pick order of A is (1+1+2)/3 = 1.3, and the pick order of B is 2/1 = 2.

If you compare these average picks, the second player, on average, drafts both cards, card A and card B, later than player 1 does (because 1.3 > 1, and 2 > 1.7). And also, incidentally, they win on average, as AB against AB is a draw, but AA wins over BB.

So, surprisingly, the “Unique Human Preference” plot above not only illustrates some features of human psychology, such as raredrafting, but also testifies to the average inefficiency of human drafts compared to bots. While humans are hunting for fancy cards, bots calmly draft more efficient cards, resulting in more consistent draft pools, and most probably, more powerful decks overall (although we didn’t try to directly estimate that).

Concluding Thoughts

Given that one of the big reasons that you like to use Draftsim is to get better and learn how to improve at drafting, what can you take away strategically from all this?

If you want a quick idea of the format, I highly suggest looking closely at the mono-colored cards on the synergy plot with all the cards labeled to see what guilds they are most closely associated with. This will give you probably the quickest overview of “how to draft the set” because you’ll see which cards work best in conjunction with which other ones.

Secondly, talk to your friends and people you play with about the list of controversial cards. Have they put them to good use? What kinds of decks want them? When can you expect to get them in the pack and how early do you need to draft them? That way when you’re at your next FNM, you’ll know whether to take that weird rare or whether you should just pass it on to the next person.

**(Unless you are a raredrafting human, that is!)

If you enjoyed this article, be sure to follow Draftsim on Facebook and Twitter and please help out the site via Patreon if you can.