As omnivores, humans have historically faced the difficult task of identifying and gathering food that satisfies nutritional needs while avoiding foodborne illnesses1. This process has contributed to the current diet of humans, which is influenced by factors ranging from an evolved preference for sugar and fat to palatability, nutritional value, culture, ease of production and climate1,2,3,4,5,6,7,8,9. The relatively small number of recipes in use (∼106, e.g. http://cookpad.com) compared to the enormous number of potential recipes (>1015, see Supplementary Information Sec S1.2), together with the frequent recurrence of particular combinations in various regional cuisines, indicates that we are exploiting but a tiny fraction of the potential combinations. Although this pattern itself can be explained by a simple evolutionary model10 or data-driven approaches11, a fundamental question still remains: are there any quantifiable and reproducible principles behind our choice of certain ingredient combinations and avoidance of others?

Although many factors such as colors, texture, temperature and sound play an important role in food sensation12,13,14,15, palatability is largely determined by flavor, representing a group of sensations including odors (due to molecules that can bind olfactory receptors), tastes (due to molecules that stimulate taste buds) and freshness or pungency (trigeminal senses)16. Therefore, the flavor compound (chemical) profile of the culinary ingredients is a natural starting point for a systematic search for principles that might underlie our choice of acceptable ingredient combinations.

A hypothesis, which over the past decade has received attention among some chefs and food scientists, states that ingredients sharing flavor compounds are more likely to taste well together than ingredients that do not17 (also see http://www.foodpairing.com). This food pairing hypothesis has been used to search for novel ingredient combinations and has prompted, for example, some contemporary restaurants to combine white chocolate and caviar, as they share trimethylamine and other flavor compounds, or chocolate and blue cheese that share at least 73 flavor compounds. As we search for evidence supporting (or refuting) any ‘rules’ that may underlie our recipes, we must bear in mind that the scientific analysis of any art, including the art of cooking, is unlikely to be capable of explaining every aspect of the artistic creativity involved. Furthermore, there are many ingredients whose main role in a recipe may not be only flavoring but something else as well (e.g. eggs' role to ensure mechanical stability or paprika's role to add vivid colors). Finally, the flavor of a dish owes as much to the mode of preparation as to the choice of particular ingredients12,18,19. However, our hypothesis is that, given the large number of recipes we use in our analysis (56,498), such factors can be systematically filtered out, allowing for the discovery of patterns that may transcend specific dishes or ingredients.

Here we introduce a network-based approach to explore the impact of flavor compounds on ingredient combinations. Efforts by food chemists to identify the flavor compounds contained in most culinary ingredients allows us to link each ingredient to 51 flavor compounds on average201. We build a bipartite network21,22,23,24,25,26 consisting of two different types of nodes: (i) 381 ingredients used in recipes throughout the world and (ii) 1,021 flavor compounds that are known to contribute to the flavor of each of these ingredients (Fig. 1A). A projection of this bipartite network is the flavor network in which two nodes (ingredients) are connected if they share at least one flavor compound (Fig. 1B). The weight of each link represents the number of shared flavor compounds, turning the flavor network into a weighted network27,22,23. While the compound concentration in each ingredient and the detection threshold of each compound should ideally be taken into account, the lack of systematic data prevents us from exploring their impact (see Sec S1.1.2 on data limitations).

Figure 1 Flavor network. (A) The ingredients contained in two recipes (left column), together with the flavor compounds that are known to be present in the ingredients (right column). Each flavor compound is linked to the ingredients that contain it, forming a bipartite network. Some compounds (shown in boldface) are shared by multiple ingredients. (B) If we project the ingredient-compound bipartite network into the ingredient space, we obtain the flavor network, whose nodes are ingredients, linked if they share at least one flavor compound. The thickness of links represents the number of flavor compounds two ingredients share and the size of each circle corresponds to the prevalence of the ingredients in recipes. (C) The distribution of recipe size, capturing the number of ingredients per recipe, across the five cuisines explored in our study. (D) The frequency-rank plot of ingredients across the five cuisines show an approximately invariant distribution across cuisines. Full size image

Since several flavor compounds are shared by a large number of ingredients, the resulting flavor network is too dense for direct visualization (average degree ). We therefore use a backbone extraction method28,29 to identify the statistically significant links for each ingredient given the sum of weights characterizing the particular node (Fig. 2), see SI for details). Not surprisingly, each module in the network corresponds to a distinct food class such as meats (red) or fruits (yellow). The links between modules inform us of the flavor compounds that hold different classes of foods together. For instance, fruits and dairy products are close to alcoholic drinks and mushrooms appear isolated, as they share a statistically significant number of flavor compounds only with other mushrooms.

Figure 2 The backbone of the flavor network. Each node denotes an ingredient, the node color indicates food category and node size reflects the ingredient prevalence in recipes. Two ingredients are connected if they share a significant number of flavor compounds, link thickness representing the number of shared compounds between the two ingredients. Adjacent links are bundled to reduce the clutter. Note that the map shows only the statistically significant links, as identified by the algorithm of Refs.28,29 for p-value 0.04. A drawing of the full network is too dense to be informative. We use, however, the full network in our subsequent measurements. Full size image

The flavor network allows us to reformulate the food pairing hypothesis as a topological property: do we more frequently use ingredient pairs that are strongly linked in the flavor network or do we avoid them? To test this hypothesis we need data on ingredient combinations preferred by humans, information readily available in the current body of recipes. For generality, we used 56,498 recipes provided by two American repositories (epicurious.com and allrecipes.com) and to avoid a distinctly Western interpretation of the world's cuisine, we also used a Korean repository (menupan.com). The recipes are grouped into geographically distinct cuisines (North American,Western European, Southern European, Latin American and East Asian; see Fig. 1 and Table S2). The average number of ingredients used in a recipe is around eight and the overall distribution is bounded (Fig. 1C), indicating that recipes with a very large or very small number of ingredients are rare. By contrast, the popularity of specific ingredients varies over four orders of magnitude, documenting huge differences in how frequently various ingredients are used in recipes (Fig. 1D), as observed in10. For example, jasmine tea, Jamaican rum and 14 other ingredients are each found in only a single recipe (see SI S1.2), but egg appears in as many as 20,951, more than one third of all recipes.