Computational Pipeline that uses Part-of-Speech tagger and human annotation to convert recipe text into a tree representation, and calculates pairwise distance to visualize the similarities.

Data Gathering: In the data gathering step, we crawl all search results for a queried dish, like chocolate chip cookie and tomato pasta, from recipe websites that use the schema.org’s Recipe scheme.

Parsing: We use off the shelf POS tagger and human annotation to parse tokens of the crawled recipes. More detail is provided in the section below on annotation interface.

Similarity Comparison: In order to obtain similarities between the recipes, we use a tree edit distance, a commonly used technique for comparing tree structures. However, to incorporate the semantic difference between individual cooking actions and ingredients in capturing the structural difference, we dynamically adjust the weights associated with the relabel operations. These weights are calculated using the cosine similarities of words from a pre-trained word embedding model.

Distance Matrix: This similarity information is stored in a pairwise distance matrix, where each element is the tree edit distance between the corresponding recipes. The distance matrix is then converted into x,y coordinates using the Gram matrix.

Hierarchical Clustering: We used hierarchical clustering to group recipes with procedural similarities.