ConvNetJS Deep Q Learning Demo

Description

This demo follows the description of the Deep Q Learning algorithm described in Playing Atari with Deep Reinforcement Learning, a paper from NIPS 2013 Deep Learning Workshop from DeepMind. The paper is a nice demo of a fairly standard (model-free) Reinforcement Learning algorithm (Q Learning) learning to play Atari games.

In this demo, instead of Atari games, we'll start out with something more simple: a 2D agent that has 9 eyes pointing in different angles ahead and every eye senses 3 values along its direction (up to a certain maximum visibility distance): distance to a wall, distance to a green thing, or distance to a red thing. The agent navigates by using one of 5 actions that turn it different angles. The red things are apples and the agent gets reward for eating them. The green things are poison and the agent gets negative reward for eating them. The training takes a few tens of minutes with current parameter settings.

Over time, the agent learns to avoid states that lead to states with low rewards, and picks actions that lead to better states instead.

Q-Learner full specification and options

The textfield below gets eval()'d to produce the Q-learner for this demo. This allows you to fiddle with various parameters and settings and also shows how you can use the API for your own purposes. All of these settings are optional but are listed to give an idea of possibilities. Feel free to change things around and hit reload! Documentation for all options is the paper linked to above, and there are also comments for every option in the source code javascript file.

var num_inputs = 27; // 9 eyes, each sees 3 numbers (wall, green, red thing proximity) var num_actions = 5; // 5 possible angles agent can turn var temporal_window = 1; // amount of temporal memory. 0 = agent lives in-the-moment :) var network_size = num_inputs*temporal_window + num_actions*temporal_window + num_inputs; // the value function network computes a value of taking any of the possible actions // given an input state. Here we specify one explicitly the hard way // but user could also equivalently instead use opt.hidden_layer_sizes = [20,20] // to just insert simple relu hidden layers. var layer_defs = []; layer_defs.push({type:'input', out_sx:1, out_sy:1, out_depth:network_size}); layer_defs.push({type:'fc', num_neurons: 50, activation:'relu'}); layer_defs.push({type:'fc', num_neurons: 50, activation:'relu'}); layer_defs.push({type:'regression', num_neurons:num_actions}); // options for the Temporal Difference learner that trains the above net // by backpropping the temporal difference learning rule. var tdtrainer_options = {learning_rate:0.001, momentum:0.0, batch_size:64, l2_decay:0.01}; var opt = {}; opt.temporal_window = temporal_window; opt.experience_size = 30000; opt.start_learn_threshold = 1000; opt.gamma = 0.7; opt.learning_steps_total = 200000; opt.learning_steps_burnin = 3000; opt.epsilon_min = 0.05; opt.epsilon_test_time = 0.05; opt.layer_defs = layer_defs; opt.tdtrainer_options = tdtrainer_options; var brain = new deepqlearn.Brain(num_inputs, num_actions, opt); // woohoo

Reload

Q-Learner API

It's very simple to use deeqlearn.Brain: Initialize your network:

var brain = new deepqlearn.Brain(num_inputs, num_actions);

And to train it proceed in loops as follows: