Google to launch StreetLearn dataset to teach machine learning models to navigate cities

American technology behemoth Google is gearing up to launch DeepMind’s StreetLearn dataset in order to teach machine learning agents to navigate cities without the help of a map.

The StreetLearn environment uses Google Street View images and it now has been employed by DeepMind, a UK-based artificial intelligence firm that was recently acquired by Google, to train a software model to navigate several cities using solely visual clues such as landmarks and not using a map or GPS coordinates.

The StreetLearn environment consists of various regions within the city centers of New York, London, and Paris. It is developed from cropped 360-degree images of street scenes from Street View. Each image, measuring 84 x 84 pixels, is a node in the larger network or graph of images with up to 65,000 nodes per 5 kilometers city region and numerous regions per city. Each region has a different urban setting such as differing numbers of bridges and parks and a varying amount of construction.

The StreetLearn system learns to travel the cities by employing deep reinforcement learning, a process that employs a series of multi-layered neural networks.

In the reinforcement learning, a software model learns which actions it needs to take to incentivize a reward. In StreetLearn case, the goal of the agent is to get as close as possible to a given landmark.

There are three neural networks in StreetLearn – one is a convolution neural network that handles image recognition and that feeds data to two Long Short-Term Memory (LSTM) networks. One LSTM is a policy network that decides the action the software model should take next based on its present reward state, the other one is a locale-specific LSTM network that is unconditionally tasked with memorizing the local environment as well as learning a representation of the present location of the model and of the position of the destination.

The company used this structure of three neural networks to make a software model that was able to transfer what it had already learned from city to city. The technology company was able to transfer what it had learned from one city to the other by freezing the training of most of the neural networks, the convolution neural network and the policy Long Short-Term Memory network, used by the software model in order to avoid the retraining of the agent in a new city. Only the LSTM network, which is tasked with memorizing the local environment, is trained from the beginning when moving to a new city.

Share

Share

Share