If you’ve ever wondered whether Dota 2 or League of Legends is the most popular multiplayer online battle arena game, or how long you’d need to spend on a treadmill to burn off that party size bag of chips you just ate, you know that you can probably find the answer by accessing a couple of relevant information sources and then applying what seems like a natural and straightforward reasoning process.

However, using multi-step reasoning to reach an understanding and solve a problem is still considered a challenge in the world of machine learning algorithms. Current mainstream question and answer (QA) systems and large-scale QA datasets such as the Stanford Question Answering Dataset (SQuAD) can only perform single-hop reasoning within a limited-size text block and find answers through keyword matching. Meanwhile, datasets that do support multi-step reasoning (e.g. COMPLEXWEBQUESTIONS and QAngaroo) show a low diversity due to the use of predefined knowledge bases in dataset construction. No existing QA datasets can effectively guide the model to learn reasoning and explanation.

To address this issue a team that calls itself the “Hot Pot Brothers” — Zhilin Yang from Carnegie Mellon, Peng Qi of Stanford University, and the Montreal Institute of Learning Algorithms’ Saizheng Zhang — recently introduced their HotpotQA Dataset, which supports a diverse, complex-reasoning-driven natural language QA system.

The trio got the original inspiration for the project over a hot pot dinner in New York City. They noted that a hot pot’s taste results from its multiple ingredients, just as correct answers to queries require multi-hop reasoning based on multiple information sources.

HotpotQA is composed of 113,000 QA pairs based on Wikipedia, one of the world’s largest and most reliable general reference websites. The authors focused mainly on hyperlinks in the first paragraphs of Wikipedia articles to extract the most informative supporting materials for QA creation. For different question types, the team randomly sampled corresponding curated content to generate candidate paragraph pairs and then crowdsourced the query creation, question answering and supporting facts collection.

Overall data collection procedure.

The HotpotQA Dataset questions are designed to require multi-hop reasoning, ie searching for the answer in different sources. The queries are also highly diverse and not limited to any pre-existing knowledge bases or schema; while the provided sentence-level supporting facts enable QA systems to reason with strong supervision while improving their explanation capabilities.

An example of multi-hop questions in HotpotQA. Supporting facts (highlighted in green) are also part of the dataset.

Distribution of question and answer categories in HotpotQA.

For more information on the project, the team’s Conference on Empirical Methods in Natural Language Processing (EMNLP) 2018 paper HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering is available on arXiv. Visit HotpotQA’s official website to download the project guide, training dataset and code.

Source: Synced China