FoodOn is an open-source, comprehensive ontology resource composed of term hierarchy facets that cover basic raw food source ingredients, process terms for packaging, cooking and preservation, and an upper-level variety of product type schemes under which food products can be categorized, outlined in Fig. 1.

Fig. 1 [Food product diagram]. The FoodOn food product scheme derived mainly from LanguaL food description facets, with the addition of ontology relationships between a food product and its related descriptive qualities, components, and processes Full size image

FoodOn is provided in the Web Ontology Language (OWL) format at the project’s GitHub repository (https://github.com/FoodOntology/foodon), where new term requests and technical support are handled.7 The latest version of the resource can also be explored via ontology lookup services like Ontobee (http://ontobee.org), the European Bioinformatics Institute (EMBL-EBI) Ontology Lookup Service (https://www.ebi.ac.uk/ols/), and BioPortal (https://bioportal.bioontology.org).

The food source hierarchy

In LanguaL the “food source” facet of about 3400 terms describes “the individual plant, animal, or chemical food source from which the food product or its major ingredient is derived.” FoodOn mirrors the organism food source terms closely, with intermediate groups like “stem or spear vegetable”, but moves chemicals (mainly additives) over to a “food component class” to separate them from whole organism references. LanguaL’s food source organisms often have associated species and/or higher level taxonomic identifiers from the Integrated Taxonomic Information System (ITIS) among others. FoodOn preserves LanguaL’s species taxonomic information as database cross-reference annotations. As well, if a FoodOn term’s ITIS reference can also be mapped to an NCBITaxon resource item, then FoodOn uses a ‘has taxonomic identifier’ relation to link the two to facilitate access to NCBI taxonomic and other linked information (e.g., sequence data), as Table 1 illustrates.8

Table 1 A FoodOn food source term like ‘apple tree food source’ is positioned as a subclass of a common language named food groups like ‘pome fruit plant food source’, and is often qualified by at least one biological taxonomic identifier Full size table

The part of plant or animal hierarchy

While some food terms usually refer to a whole edible organism (anchovy, grasshopper), others colloquially refer just to part of an organism (berry, not the bush; apple, not the tree), and some of those parts are not always present or edible in the organism. LanguaL’s “part of plant or animal” facet is defined as “Anatomical part of the plant or animal from which the food product or its major ingredient is derived ...” FoodOn echoes most of LanguaL’s plant and animal part descriptors—both anatomical (arm, organ meat, seed) and fluid (blood, milk)—but reuses existing UBERON and Plant ontology term identifiers for them. This leads to food products like apple being defined as: “‘apple (whole) food product’ SubClassOf: ‘pome fruit’ and ’develops from part of’ some ‘apple tree as food source’”.9 Future work may involve detailing the exact parts and stage of life conditions that make a given food bioavailable (for example, an ‘apple’ is only “part of” its tree during the annual fruiting cycle, is only edible when ripe, and needs a proviso that its seeds are lethal if eaten in sufficient quantity).

Food products and product types

Single or multi-component foods need to be described for food inspection recordkeeping, disease outbreak investigation, food industry supply-chain inventory, and to accommodate dietary restrictions and recipe adjustments. FoodOn design differs from LanguaL in order to achieve this functionality, and these differences are highlighted below.

LanguaL’s food product indexing guidelines adequately describe single ingredient foods by allowing one primary food source (facet B) ingredient to be stated which other facets like “cooking method” implicitly reference. LanguaL indexing is typically applied to a database of food items such that “Each food is described by a set of standard, controlled terms chosen from facets characteristic of the nutritional and/or hygienic quality of a food ...", yielding a list of LanguaL term identifiers for each item. For example in LanguaL “corn flakes” would be indexed as a set of facet codes including “A0258 B1379 C0208 E0153 F0014 G0003 H0100 H0138 H0158 H0274 J0116 K0003 M0001 N0001 P0024” as partly shown in Fig. 2, and which can be looked up on the LanguaL website thesaurus page at http://www.langual.org/.10

Fig. 2 [Corn flakes diagram]. A sample of LanguaL facet terms used to describe a brand name corn flake breakfast cereal, and FoodOn’s corn flakes product representation which uses OWL ontology object properties to link a food product to its components, qualities, and processes Full size image

Multiple component foods are more challenging because LanguaL itself does not aspire to be a global food type catalog, and so provides no facility for giving identifiers to component food products. LanguaL suggests curators follow a “Full Ingredient Indexing” protocol in which all ingredients of a product are coded in descending order by weight, but for products like lasagna, one cannot reference components like “lasagna noodle” or “cheese” in the list—only food source items like “durum wheat” are allowed. LanguaL provides one other way to reference other raw ingredients (besides the primary one) by a set of “ingredient added” terms which from an ontology perspective awkwardly duplicate some but not all terms in the “food source” facet.11

In a major departure from LanguaL, FoodOn allows food product terms like lasagna noodle (FOODON_03306124) to be defined directly in the ontology, and allows them to reference component products through various relations which do not exist in LanguaL: The “has ingredient” relation applies between two food products, covering the case where a component may no longer be discernable in a final product. “Has part” may be used when a food literally has a part of some other food, unchanged, as in an apple in a caramel apple. The “composed primarily of” relation can replace “part of” if the part is the greatest constituent. “Derives from” is used when a product is transformed by a process in some way from its initial substance, as in applesauce derives from apple. “Develops from part of” is used where a food product is a non-essential part of a food source organism (e.g., zucchini, apple or other fruit). FoodOn has deprecated most of LanguaL’s “ingredient added” hierarchy and instead uses the above relations to reference ingredients. “Output of” indicates that a food product is the product of a given process. “Has quality” holds between a food material and biological, physical, chemical or organoleptic properties which result from corresponding processes. The two approaches to documenting a product are contrasted in Fig. 2.

For food component references to work, FoodOn requires ontology terms and identifiers for all such components. Coverage in this domain has been started by placing food product types (currently numbering 9445 classes) into a “foodon product type” branch, contained in a “foodon_product_import.owl” file. Some of these classes were inherited from the Environment Ontology’s (ENVO) existing sub-domain of food products, while the remainder are from the LanguaL index of FDA’s Scientific Information and Retrieval Exchange Network (SIREN) food database of over 9500 foods which are referenced by FDA regulatory activity documentation, and which anticipate many terms that would otherwise be added piecemeal.12,13,14 Currently, most of the “foodon product type” hierarchy is set explicitly but this will transition to an inferred structure when its growing list of axiomatized products (like sliced canned apples, and baked apple pie, as illustrated in Fig. 3) is sufficiently large.

Fig. 3 [Apple product diagram]. Overview of apple food products based on “apple (whole) food product”. Products have observable qualities and parts often as a result of the processes that transform them Full size image

New FoodOn products can be organized under the foodon product type branch as well as other schemes brought in from LanguaL’s standard product type schemes including the EuroFIR Food Classification and USDA Standard Reference schemes.5 FoodOn now has coverage of some asian foods via GitHub requests; other databases (like the LanguaL-indexed French, Greek, and Hungarian ones) could be imported in the future to increase international coverage.

The SIREN food product database does not provide definitions directly, so an ongoing FoodOn task is to populate the imported SIREN terms with appropriate Wikipedia definitions. While consensus on some product definitions may be challenging (for example, should the definition of lasagna expressly allow for cheese substitutes?), FoodOn does want to accommodate the description of more general food categories, as well as food products about which little is known. Conversely, FoodOn avoids too-specific “pre-composed” terms (terms which represent a specific combination of other variables and which verge on recipes). For example, in “apple, raw, without skin, sliced, cooked, microwaved”, removing the cooking method variation allows the class greater applicability. If the cooking method should be preserved in the data at hand, it may be given by a separate field or relationship.

Food analogs and allergens

It is helpful to link ingredients to substitutes for use in analyses and applications that are sensitive to allergen and other dietary constraints. FoodOn has a “has food substance analog” relation which can connect any two food source items or products, inviting substitution. This symmetric relation allows us to associate natural and synthetic vanilla, but makes no assumption about which side is imitating the other, or the quality of the substitution or appropriate ratio. Pertinent to allergen analysis and food substitution, FoodOn food source terms related to allergic hypersensitivity diseases are being referenced from within the disease ontology (DO).15

Ontology reuse

FoodOn aspires to be a well-documented, actively curated and stable standard, but this depends ultimately on the quality and longevity of its curation model and expert community. As illustrated in Fig. 4, FoodOn’s membership in OBO Foundry enables seamless access to ontologies that cover domains like consumer demographics, agricultural practice, chemical composition and antimicrobials, taxonomy, anatomy, and disease phenotype, which all coexist like mutually-referencing volumes of an encyclopedia.

Fig. 4 [Foodon component pie-shaped diagram]. FoodOn reuses terms from a number of OBOFoundry.org ontologies as well as LanguaL and SIREN Full size image

As shown in Figs 5 and 6, FoodOn aims to cover food products and broad food processing steps, acting as more of a generalist hub that interfaces with more specialist domain ontologies that involve technical food science modeling. This follows the same orthogonal pattern that ENVO has with respect to FoodOn, AGRO, and the CROP ontology among others. FoodOn product hierarchies and relations will continue to expand with new intermediary classes introduced as needed.

Fig. 5 [Subject branch diagram]. A tree visualization of 15 upper-level FoodOn topical branches Full size image

Fig. 6 [Form application diagram]. Rendering a FoodOn-driven specification as a web form using the GEEM platform Full size image

OBO Foundry encourages each ontology (with some exceptions) to reuse terms from others where applicable. Reuse of terms allows the effort of providing standardized vocabulary to be shared; so for example, FoodOn has replaced about 600 LanguaL chemicals (e.g., food additives) with ChEBI ontology chemical identifiers.16 OBO Foundry ontologies must aspire to a certain overall technical structure, including the upper-level Basic Formal Ontology, curation best practices that involve versioning of ontology files, permanent URLS for terms, and a scheme for annotating deprecated terms’ replacements so that database content can be updated smoothly.17

Most relations in Fig. 1 are from the OBO Foundry’s Relation Ontology (RO, http://obofoundry.org/ontology/ro.html) which carry OWL relation domain and codomain restrictions. In FoodOn, these are combined with the upper-level Basic Formal Ontology (BFO) disjoint With axioms, allowing a reasoner like ELK (OWL 2 EL profile) or HermiT (OWL 2 DL profile) to enforce proper reference to processes, qualities, material entities, roles, and information entities in FoodOn. For example, a proposed logical definition for “hen”:

‘hen (food source)’ : 'chicken (food source)' and ('has quality' some 'female organism') and ('has quality' some 'adult organism')

leads to a contradiction because under BFO, the “has quality” relation only permits qualities in its range (right side of the relation). “female organism” and “adult organism” are material entities, a type of BFO “independent continuant” that is disjoint with respect to qualities. In other words, qualities are features of material entities but cannot themselves be material entities. The hen definition can be resolved by stating more directly:

‘hen (food source)’ : 'chicken (food source)' and 'female organism' and 'adult organism'

FoodOn supports reuse in third-party standards via its GitHub repository, allowing users to access and retrieve a particular version or release at any time. However to incorporate such ontology content into agency infrastructure directly often requires a mastery of fairly complex Semantic Web Technology, including knowledge of OWL and the associated SPARQL querying language, as well as the abstractions of an upper-level ontology under which terms are organized.18 Various efforts are encouraging ontology reuse without the need for extensive training by providing web portals of customizable spreadsheet or database templates and downloadable specifications, all driven by standardized ontology content.19,20 Hsiao Lab is developing a tool that enables marked-up ontology content to be transformed into standards which are provided in both a visual web form and tabular or json version for implementation in data curation and exchange systems,.21 Figure 5 shows an example FoodOn-driven standard for food specimen contextual data (viewable via Google Chrome at http://genepio.org/geem/form.html#GENEPIO:0002083). Technically this is accomplished using a python script that uses the rdflib module to read an ontology into memory as an RDF graph of triples, and then uses SPARQL to query it and convert it into a JSON representation which the GEEM web interface then renders as HTML forms or downloadable specifications.