In our recent easter recruiting campaign 2017, we were given the following challenge:

Can you produce an AI program which manages to play the old norwegian text-based online game vikingmud. The program has to address also the other players in the areas PVE and PVP. The address to the game is http://vikingmud.org/

First I have to say that this is a very exciting problem, and I was lucky to be handed this challenge. Solving such a challenge is one of the next frontiers of machine learning. So what I’ll do is show you how far the latest results in machine learning can take us along the path to a solution.

Vikingmud is a Multi-User Dungeon (MUD) game. One interacts with the game via textual descriptions, in order to know the current world state. Then one uses natural language commands to take actions.

When designing an artificial agent, people try to mimic the human ability to learn from oneself by trial-and-error, using rewards or punishments. This paradigm is known as reinforcement learning (RL). In our case, the agent will have to learn from the text representation of the game as input, without any heuristics or hand-made features. This is addressed by using a framework called deep Learning. For the agent to be able to choose «good» actions over «bad» actions, one uses a value function, usually denoted by Q, which measures the action’s expected long-term reward (that is, the average of your future profit given that you choose to follow a given action). Q-Learning [Wakins and Dayan 1992] is a technique used to learn an optimal Q for an agent. One usually starts with a random Q-function, which the agent continually updates by playing the game and obtaining rewards. Thus, while training, the agent can use the updated version at each step, to choose the action with the highest value to maximize its expected future reward. There is some exciting math under the hood, which guarantees that things will work out eventually, after iterating this process enough times.

In many cases, finding this Q-function is a big challenge. People found out that using artificial neural networks, one could approximate this Q-function. I have picked the following articles which address the problem of making an artificial agent capable of playing text-based games:

● Deep Reinforcement Learning with a Natural Language Action Space [DRLNLAS]

● Language Understanding for Text-based Games using Deep Reinforcement Learning [LUTGDRL]

In both, the authors provide Deep Q-Networks (DQN) to find good approximations of the Q-function.

Strategy

In the following, we address the following strategy:

Create a machine learning model that learns state representations of the game and action policies using game rewards as feedback.

The authors of [DRLNLAS] make the following distinction between text-based games, according to how players have to respond and communicate with the game engine:

● Parser-based. Games that accept typed-in commands from the player, such as «go east», «move chair upstairs».

● Choice-based. Games that present options after the state text.

● Hypertext-based. Games that present actions embedded within the state text.

The strategies to approach these games varies according to the types above. For choice-based or hypertext-based games, the action space can increase exponentially with the length of the action sentences. Therefore, the authors of [DRLNLAS] propose a deep reinforcement relevance network (DRRN, see [DRLNLAS] Figure 1c). They use separate neural networks to model the state and the action space. This splitting allows them to model the “relevance” of the actions of a given state by an interaction function that combines the results of these two networks. This interaction function is used as their Q-function. The general assumption is that the text used in the state of the game, tends to be very different from the text used by the commands. So two different types of meaning representations are learned. The learning algorithm they propose is designed in a way that their Q-function learns to give high values to “good” actions and low values to “bad” actions, where good and bad are calculated via a reward function (these rewards are usually given upon completion of in-game quests, or by tracking player’s health points, experience points, etc). At the same time, the DRRN learns a continuous representation of the action space, which can handle very big action spaces.

VikingMUD is a parser-based text game, with a finite command set. In this case, a fixed-action-set DQN can be enough, as proposed in [LUTGDRL]. They call their network LSTM-DQN. This network contains two modules. The first one, called the representation generator, uses Long Short-Term Memory (LSTM) networks (see LUTGDRL Figure 2), which have proved to be very effective to handle the sequential nature of text (they can connect and recognize long-range patterns between words in text), and automatically learn useful representations for arbitrary text descriptions. The output is a vector representing the given state. This is feeded into the second module of the network, called the action scorer. This module produces scores for the set of possible actions given the current state. Since the action-set is finite (and not so big), they consider all possible actions and objects available in the game and predict both for each state, using the same network (LUTGDRL Figure 2). One advantage of the two-module architecture of LSTM-DQN, is that the authors achieve transfer learning. That is, the representations learned in the representation generator, can be transferred to new game worlds. To test the agent, the authors used Evennia, an open-source library for building online textual MUD games in Python.

As you can see, the approaches of these two articles are quite different. Both methods manage to create successful artificial agents that can play MUD games. Their source code is available at the following addresses:

[LUTGDRL]: https://people.csail.mit.edu/karthikn/mud-play/

[DRLNLAS]: https://github.com/jvking/text-games

So which approach is better for us? Well, it is hard to say without testing both approaches. The authors of [DRLNLAS] tried DRRN vs LSTM-DQN in a Fantasy World created by the developers of Evennia, and LSTM-DQN outperformed DRRN. But it is difficult to say, who would win if these two models are put against each other playing VikingMUD. The action-space for Fantasy World is just of size 222 (the possible combination of the 6 actions and 37 objects in the game). VikingMUD is much more complicated, with richer text representations of the world, and many more commands to execute. Furthermore, DRRN seems to be better at handling actions described with more complex language, which seems like an advantage for playing VikingMUD.

Complications

Let’s assume for now that we have an agent, already trained as in one of the articles above. How far are we to give a solution to the challenge? Well, we would have to work on the following:

Problem 1. Get a training area in VikingMUD. Here we would have to work together with the developers of VikingMUD to be able to access an environment where we can train and play with our AI agent. Maybe we could get our own area where real players could come and help us train our agent.

Problem 2. Find a way to introduce the rules of the game into the model. For example, not killing other players in order for our agent not to be banned out of the game. Maybe it is as easy as teaching our agent the rules in plain english?

Problem 3. Train our agent to a human-like Level. For text-based games we are not there yet. Here one would have to replicate the success attained, for example in playing Atari games. Maybe one could sprinkle our agent with a bit of curiosity to make it more human-like?

Problem 4. Train our agent to talk to other humas. VikingMUD is a social game. Players like to talk to each other, so it would be desirable that our agent managed to have a sensible conversation with other players.

Conclusion

In this post, we have dipped a toe into deep reinforcement learning. I described the approach taken by two of the latest results in machine learning for playing text-based games, and walked you through their network design architecture. Hopefully, the rest of the details should be easier to pick from the papers themselves. Unfortunately, there are still a lot of complications in order to give a solution to the VikingMUD challenge. But lots of progress is being made by the hour!

Feel free to play with the code from the GitHub links and make sure you explore the embedded links along the text for a better reading experience. Leave a comment and let me know what you think about this post. Thank you for reading!

Further Reading:

Here is a curated list of my top links for deep learning:

Deep Reinforcement Learning: An Overview

An overview of RL, including games, in particular, AlphaGo, robotics, spoken dialogue systems (a.k.a. chatbot), machine translation, text sequence prediction, neural architecture design, personalized web services, healthcare, finance, and music generation.

https://deepmind.com/blog/deep-reinforcement-learning/

From the pioneers that introduced the first succesful algorithm using deep reinforcement learning.

http://neuro.cs.ut.ee/demystifying-deep-reinforcement-learning/

A very nice starting point to get into deep learning. Look at the links at the end of this article.

http://rll.berkeley.edu/deeprlcourse/

9.520 MIT Statistical Learning – Theory and Applications

More advanced courses for getting to know the theory behind deep learning.

Why deep learning works so well

A nice video showing some links between Physics, neuroscience and deep learning. Maybe the reason why deep learning works so well is connected to the nature of reality?

https://www.reddit.com/r/MachineLearning/

The one link to rule them all.

Credits

I’d like to thank Asgeir Steine and Dominique Bye-Ribaut for the comments and suggestions on the drafts of this post, and Julie Eléni Solhaug for the animation of the picture from the game.

References:

[Wakins and Dayan 1992] Christopher JCH Watkins and Peter Dayan. 1992. Q-learning. Machine learning, 8(3-4):279:292.

[DRLNLAS] Ji He, Jianshu Chen, Xiaodong He, Jianfeng Gao, Lihong Li, Li Deng and Mari Ostendorf. 2016. Deep Reinforcement Learning with a Natural Language Action Space.

[LUTGDRL] Karthik Narasimhan, Tejas D . Language Understanding for Text-based Games using Deep Reinforcement Learning.