I held a statistical wine tasting to teach you about rank-based statistics.

Ah, the sacrifices I make for science. Usually in these Methods Man articles, we use a recent publication as a springboard to talk about how statistics work. Today, we're changing that up just a bit. We're going to discuss how I applied rank-based statistics to a wine tasting party for 20 of our closest friends.

Before we get into the abnormal world of ranks, we need to know what normal is -- literally. "Normal" means something very specific in statistics. It's the bell-shaped curve you've probably seen before. A normal distribution looks something like this:

Figure 1: A population of individuals with an average age of 50. We call this shape a normal distribution.

Normal distributions give statisticians a warm fuzzy feeling because they can be specified with just two numbers. That's right, all that data from figure 1 can be (almost) fully characterized if we know two things: first, the mean, which is 50 in this case, and second, the standard deviation, 10 in this case. That's it. Virtually all of the math behind common statistical tests assumes that some distribution of data can be expressed by those two numbers.

But what if your data is not normally distributed -- like, say, wine prices:

Figure 2: Distribution of wine prices among 15,446 wines sold in Pennsylvania State Liquor Stores. I stopped the scale at $200 because, come on. We can thank the Commonwealthâs monopolistic control over liquor and wine distribution for this data, and for our inability to have my favorite wine (Dogwood Cellars Cabernet) delivered.

This type of data can't be neatly summed up by two numbers. And there are plenty of techniques to deal with that fact, all with their own limitations. In these situations, ranking is often appropriate.

The way ranking works, to stick with our wine example, is we simply list the wines from least to most expensive (the interval increases in price do not matter) and assign them values 1, 2, 3, 4, and so on. What the math behind this analysis says is that an increase in price is an important factor, but the actual amount of increase is not.

But we aren't in the business of ranking wines by price. We want to know how the price relates to something. In this case, subjective enjoyment.

Now, you'll be able to find plenty of articles out there (including this nice one) that suggest that blind tasters can't tell a $3 wine from a $300 wine. But I don't like these studies. They invariably ask people to taste a wine and rate it, from like 1 to 10 or something. Humans aren't good at this. What we're good at is comparing. The best study would be to have people blindly taste wines and ... wait for it ... rank them. Keep going back and forth from one to another until they decide which is top shelf and which one is the proverbial bottom of the barrel.

So that's just what we did.

All the data you are about to see is real, the names have been changed to protect the inebriated.

The problem with rating wines is that there is no gold standard. Sure, there are the Robert Parkers of the world, but he doesn't rate every wine there is, and I couldn't afford the wines he does rate. So I decided that we would treat price as the gold standard (I know, a million reasons this is unfair, but what are you going to do?). The competition at the wine tasting would be to taste ten wines, blinded, and rank them by order of price.

Here's the setup:

Identical decanters. All red wines. This is all the participants got to know. But you get to see the data. Here are the wines, and prices I paid for them, ranked (of course) from cheapest to most expensive:

Table 1: Wines at the statistical wine tasting, by price.

My 20 invitees were there, sipping away, not knowing what they were drinking, and merely had to put the 10 wines in order based on how enjoyable the wines were and, by proxy, how expensive they were.

After all, the price-to-taste relationship is almost certainly not linear. I am just not convinced that the Duckhorn (while delicious) is 10 times better than the serviceable Barefoot Cellars Pinot Noir. But by using ranks, we avoid that problem. We suspect that more expensive wine will taste better, but it doesn't have to do so perfectly in line with the dollars spent.

One minor statistical tweak here, since we abandoned normal data long ago, is to use a branch of statistics that uses "non-parametric" tests (in this case a Spearman correlation coefficient). I mention that only so that if you see those terms in medical papers, you start thinking about ranks, non-linear relationships, and all the beautiful weirdness that biology involves.

Back to the party. Everyone ranks the wines based on how expensive they think they are. Great.

Now, how do we assess how "right" they are? Clearly, if someone put all ten in perfect order they'd win, but the chances of that happening are pretty small. By chance alone, the odds are 1 in 3,628,800.

We could give a point for each one you got in the right place -- like if you ranked poor (but yummy) Pacific Peak Cabernet lowest, you'd get a point. But that's not really fair. What if someone (let's say Walt) flipped each pair of two wines from bottom to top, like this?

Table 2: Our intuition tells us that Walt did really well. But he didnât put any wine in the right place.

He'd be awarded no points (and may God have mercy on your soul).

But using rank-based statistics saves someone like Walt.

Here's a scatterplot of how the best person at the party did. Let's call him Flynn:

Figure 3: Flynn's results. As price increases, Flynn ranks wines higher.

When we change the x-axis to reflect the wine's rank, instead of price, it's even clearer that he did well.

Figure 4: Flynn's results. Flynn tends to rank higher-priced wines higher.

Applying our statistical test, I can tell you that the chance of Flynn's performance being a statistical fluke was 2%. That's below a "significance" threshold, and enough to give Flynn the win, which entitled him to a gift certificate to an upscale cupcake joint. Some of you naysayers out there will point out that, since the party had lots of people, someone was bound to do well, even if no one had a clue what they were doing. You're right, but at the same time, somebody has to win.

The party as a whole didn't fare as well as Flynn in wine-ranking ability. Combining everyone's results suggested that my coterie was not much better at organizing wines by price than chance. To be fair, we are a group of medical/statistical nerds -- your results may vary.

While it's not clear that you should ever pay $60 for a bottle of wine, it is clear that using the right statistical tests can be a lot of fun.

If you want to host your own statistical wine tasting, all the information you need is at my website: www.methodsman.com. There you can download forms for your attendees, and a spreadsheet template for your data. Just send me an invite, OK?

The Methods Man is F. Perry Wilson, Assistant Professor of Medicine at the Yale School of Medicine. He earned his BA from Harvard University, graduating with honors with a degree in biochemistry. He then attended Columbia College of Physicians and Surgeons in New York City. From there he moved to Philadelphia to complete his internal medicine residency and nephrology fellowship at the Hospital of the University of Pennsylvania. During his research time, Wilson also obtained a Masters of Science in Clinical Epidemiology from the University of Pennsylvania. He is an accomplished author of many scientific articles and holds several NIH grants. If you'd like to see more of his work, please visit him at www.methodsman.com or follow @methodsmanmd on Twitter.