WHY THIS MATTERS IN BRIEF DeepMind is trying to create a new Artificial General Intelligence architecture by creating an adaptable, massive neural network that draws on thousands of other “sub” neural networks.

On January 20th Google’s DeepMind division, the division behind a myriad of artificial intelligence (AI) firsts, quietly submitted a paper on Arxiv entitled “PathNet: Evolution Channels Gradient Descent in Super Neural Networks” that mostly went unnoticed.

While the research shines a spotlight on the latest trend in deep learning research – the desire to merge Modular Deep Learning, Meta-Learning and Reinforcement Learning into a single solution that leads to the creation of even more capable deep learning systems – what makes this paper special is the fact that it’s DeepMind’s stab to become the first company to build the first, fabled, Artificial General Intelligence (AGI) solution.

One day, it’s said, AGI will be the basis of a new breed of machines whose overall intelligence rivals that of our own – that is to say it will be able to perform any intellectual task that humans can. It’s fair to say that on its own that’s an accomplishment, but then longer term there’ll, again, be increased pressure on jobs which in an age of rising automation is already hitting the headlines on a daily basis.

Perhaps even more significantly though, many experts also believe that once we achieve AGI then Artificial Super Intelligence (ASI) won’t be far behind – and at that point we’ll be the intellectual equivalent of, in Elon Musk’s own words, “a pet cat,” to our new machine overlords. At the moment analysts, who’ve been wrong about almost everything else, from autonomous cars to virtual reality, yeah it’s an analysts versus futurists thing, guess where I sit, think that the first AGI machine will appear in the mid 2030’s but now that DeepMind have a plan we could see that date come down, possibly by five to ten years to the late 2020’s.

“For artificial general intelligence (AGI) it would be efficient if multiple users trained the same giant neural network, permitting parameter reuse, without catastrophic forgetting. PathNet is a first step in this direction. It is a neural network algorithm that uses agents embedded in the neural network whose task is to discover which parts of the network to re-use for new tasks,” says the paper.

Catchy.

In short unlike more traditional monolithic deep learning networks, PathNet will rely on a network – a collection – of not just one neural network like many of today’s systems do, but many and it will train them all to perform multiple tasks.

Putting it in more human terms this new AGI network will rely on, and draw on, the collective power of lots of skilled, specialised smaller neural networks, just as our own brains do, and combine them into what ostensibly will be the first AGI “brain.”

In the authors experiments, they have shown that a network trained on a second task learns faster than if the network was trained from scratch. The significance of this is that it shows that the new system is doing something called “transfer learning,” where previous knowledge is reused in new ways.

An example of PathNet transfer learning

The new PathNet model includes a mix of transfer learning, continual learning and multitask learning and it’s thought that all of these are essential in order to create a more continuously adaptive network, which, again, it’s thought will be necessary if we’re to create an AGI.

PathNet consists of layers of neural networks where the interconnection between each network in a layer is discovered using different search methods, four networks per layer, at a time, and the paper describes two discovery algorithms, one based on a genetic, or “evolutionary” algorithm, and another one based on something called A3C reinforcement learning.

The authors inspiration for the new system apparently came from the “Outrageously Large Neural Networks” project from Google Brain, which is described as follows:

“Achieving greater than 1,000x improvements in [AI] model capacity with only minor losses in computational efficiency on modern GPU clusters. We introduce a Sparsely-Gated Mixture-of-Experts layer (MoE), consisting of up to thousands of feed-forward sub-networks. A trainable gating network determines a sparse combination of these experts to use for each example. We apply the MoE to the tasks of language modelling and machine translation, where model capacity is critical for absorbing the vast quantities of knowledge available in the training corpora. We present model architectures in which a MoE with up to 137 billion parameters is applied convolutionally between stacked LSTM layers.”

The new PathNet architecture might also give others in the field a roadmap that should help them create more adaptable deep learning architectures, most of which today are rigid after training and rigid when deployed – unlike biological brains that are continuously learning. PathNet on the other hand now means that new neural networks could be taught new skills, while at the same time drawing on, and leveraging, other neural networks that have already had similar, or complimentary, types of training – allowing the whole system to learn much faster.

Furthermore, as these learning systems improve, and as they are capable of doing new things it’s also likely that in the future they’ll be able to use less computing power, and again that means that the whole cycle will accelerate as they draw on the skills of thousands, or maybe even millions, of “sub” neural networks at a time.

Either way, perhaps AGI just took a giant leap forwards.