For an AI system to acquire knowledge the way humans generally do it would need to interact with its surroundings and extract information through its own attention and analysis choices. That’s the idea behind a new paper from Microsoft Research, Polytechnique Montreal, MILA and and the University of Montreal. QAit — Question Answering with Interactive Text introduces an AI system that “learns” to answer questions by interacting with and gathering information from its environment.

QAit question answering game

The AI community has seen the emergence of countless machine reading comprehension (MRC) tasks in recent years. Most of these new MRC tasks however rely on preloaded knowledge sources, and answer questions either by extracting specific words from a knowledge source or by generating text strings.

There is nothing interactive about these traditional question answering models, which tend to behave more like shallow pattern matching: they need fully observed information to predict answers and only focus on declarative knowledge (facts that can be stated explicitly).

QAit is different in that the agent focuses on procedural knowledge, and can interact with its partially observed environment and generate training data in real time. The researchers built text-based games with relevant question-answer pairs. The question types include locations, existence and attribute. They created fixed maps and random maps, which are applied based on whether the layout of the environment and objects within it are fixed or random.

The researchers used QA-DQN as their baseline agent and trained with vanilla DQN, Double DQN and Rainbow.

Overall architecture of baseline agent

The agent consists of an encoder, aggregator, command generator and question answerer. The encoder transfers inputs (observations and questions) to hidden representations which the aggregator merges. The command generator then generates Q-values for all action, modifier and object words.

Training accuracy on the fixed map (left) and random map setup (right)

Agent performance when trained on 10 games, 500 games and “unlimited” games

In training with insufficient information, researchers observed that the agent will be able to master the training games, especially Vanilla DQN and DDQN, if the amount of training data is small. As the amount of training data increases, Rainbow becomes more effective.

Researchers also tested training under sufficient information, with the results shown below:

Test performance given sufficient information

Not surprisingly, performance improved significantly in experiments that involved sufficient information. The researchers conclude that the QAit can help train models to learn effectively given sufficient information, which suggests that interactive and more human-like information seeking models may be a research direction that can challenge simple word matching methods.

The paper The HSIC Bottleneck: Interactive Language Learning by Question Answering is on arXiv.