There are currently well over 100,000 votes on What to Brew, meaning it’s a treasure trove of data. But data is only as useful as its analysis. This article looks at my work to find groupings of similar homebrew additions based on how well they work with each style.

I decided to use k-means clustering to try to group the additions. In some ways, this data is not the best fit for this method- the data is noisy, and there aren’t discrete clusters. However, there were some interesting findings.

K-means clustering basically takes the What to Brew data, and attempts to find which additions are most similar to other additions, based on the styles they do and don’t work with, grouping them into a set number of clusters. In other words, I ask the computer- “If you were to group these additions into this many clusters, how would you do it?”

Determining the correct number of clusters is tricky- too few, and the groupings are too big to be useful; too many, and many of the additions get put in clusters by themselves, which isn’t useful either. I decided to run the script for 2-30 clusters, to see the results. Here are some of the significant groupings:

2 clusters

With just 2 clusters, there was 1 large cluster, and 1 smaller one that seems to group additions in a promising way, listed below. You could likely successfully mix any 2 of these.

rye, maple, raspberry, cherry, oak, bourbon, cinnamon, coffee, chocolate, hazelnut, orange peel, blackberry, smoke, pecan, vanilla, whiskey

3 clusters

This grouping was identical to the list above, with the exception that maple was put into its own cluster, where it remains for much of the remainder of the analysis. I’m not sure if this is due to it being significantly different than other additions, or if its due to it having less votes, as it was added to the database after other additions. (Go vote!)

8 clusters

As we get into more, smaller clusters, we see some more interesting patterns emerge. Weird additions that are unpopular cluster in #5. There seems to be a summer flavor grouping in #6. I’m intrigued by the grouping of #3- it seems like a mix of herbal and fruit ingredients

0: lavender

1: rye, raspberry, cherry, oak, orange peel, blackberry

2: maple

3: lemon grass, apple, ginger, peach, grapefruit, seeds of paradise, blueberry, lemon peel, pear, elderflower, juniper berries, coriander, rose hips, apricot, cranberry

4: bourbon, cinnamon, coffee, chocolate, hazelnut, smoke, pecan, whiskey

5: piña colada, coconut, chai, bacon, mint, anise, chicory, cardamom, peppercorn, sweet potato, pumpkin, peanut butter

6: watermelon, hibiscus, chamomile, rhubarb, basil, cucumber, green tea, lemon pepper, plum, strawberry

7: vanilla

12 clusters

Smaller groupings are emerging as we get more clusters. Rye and oak (#0) are no longer grouped with the berry grouping (#4). The weird additions are still sticking together in #7. I’m curious why sweet potato and pumpkin aren’t clustering together. Other surprising ones that aren’t clustering together are chicory and coffee, and apricot and peach.

0: rye, oak

1: watermelon, rhubarb, cucumber

2: maple

3: coconut, chai, anise, chicory, cinnamon, cardamom, pumpkin

4: raspberry, cherry, orange peel, blackberry

5: bourbon, coffee, chocolate, hazelnut, smoke, pecan, whiskey

6: lemon grass, grapefruit, lemon peel, apricot

7: piña colada, bacon, mint, basil, peanut butter

8: vanilla

9: sweet potato

10; apple, lavender, hibiscus, chamomile, green tea, peppercorn, lemon pepper, rose hips

11: ginger, peach, seeds of paradise, blueberry, pear, elderflower, juniper berries, coriander, plum, strawberry, cranberry

20 clusters

At this point, we’re seeing more and more clusters with single additions. We’re also seeing some clusters being refined. Compare #3 from 12 clusters above with #16 below- coconut, pumpkin and cinnamon aren’t quite as similar to the other ingredients. We still have 2 larger clusters- #1 seems to be more tart/acidic, and #5 seems to be more citrusy/spicy.

0: raspberry, cherry, orange peel, blackberry

1: apple, watermelon, hibiscus, chamomile, rhubarb, green tea, peppercorn, lemon pepper, rose hips, strawberry

2: bacon, peanut butter

3: maple

4: cinnamon, pecan

5: lemon grass, ginger, peach, grapefruit, seeds of paradise, blueberry, lemon peel, pear, elderflower, juniper berries, coriander, apricot, cranberry

6: sweet potato

7: basil

8: coconut

9: rye

10: vanilla

11: plum

12: bourbon, coffee, whiskey

13: piña colada, mint, cucumber

14: chocolate, hazelnut

15: pumpkin

16: chai, anise, chicory, cardamom

17: lavender

18: oak

19: smoke

30 clusters

At 30 clusters, there are 18 clusters with single ingredients, which I’ve removed below. The ones that remain are likely quite similar. Note that #4 has stayed steady since 12 clusters. Bourbon and whiskey (#1) aren’t a surprise to see together. There are some really solid groupings still at this point.

0: hibiscus, strawberry

1: bourbon, whiskey

4: raspberry, cherry, orange peel, blackberry

5: piña colada, cucumber

6: bacon, peanut butter

9: apple, rhubarb, blueberry, pear, cranberry

15: lemon grass, grapefruit

16: peppercorn, lemon pepper

24: ginger, peach, lemon peel, elderflower, apricot

26: coffee, chocolate, hazelnut

27: seeds of paradise, juniper berries, rose hips

29: chai, chamomile, anise, chicory

50+ clusters

I also ran a few tests on high numbers of clusters to see what still remained close. At 50 clusters, the following were still paired:

raspberry, blackberry

bourbon, whiskey

blueberry, elderflower

lemon grass, grapefruit

I wanted to see which two of the 54 additions were the closest, so I ran it with 53 clusters, and wasn’t surprised to see that the two closest additions were:

bourbon, whiskey

Conclusion of homebrew addition clusters:

K-means clustering provides some interesting insights into the data. While there are some surprising connections I wouldn’t have thought about, the groupings make sense in a way that validates the data. Further analysis could be done with other clustering algorithms.

If you’d like to see the full results, you can download all the iterations here.