“What I cannot create, I do not understand.” - Richard Feynman.

On intelligence: its creation and understanding

The first glimmers of human-like intelligence appeared a few million years ago on the African continent, and continued to evolve, eventually culminating in the brain of our species Homo sapiens about 100,000 years ago. As modern humans, we can only imagine what our ancient ancestors experienced as they peered out into the night sky to contemplate the very nature of physical reality, as well as introspectively peered within themselves to ponder the very nature of their own mental reality. In the last few hundred years, our species has made immense intellectual progress in developing a precise understanding of physical reality, by discovering fundamental mathematical laws governing the behavior of space, time, matter and energy, now codified in the grand frameworks of quantum mechanics and general relativity. However, we are at the very beginnings of our quest to understand the nature of our mental reality. In particular how does human intelligence emerge from the biological wet-ware of 100 billion neurons connected by 100 trillion synapses? The modern disciplines of neuroscience, psychology and cognitive science have made important progress over the last 100 years, laying the foundations for attacking this grand question. Indeed as President Obama explained at the announcement of the United States Brain Initiative in 2013, the time is now ripe to “unlock the mystery of the three pounds of matter that sits between our ears.”

But when it comes to our own mental capabilities, for modern humans it is not enough to simply understand them. We also feel a deep desire to recreate these capabilities in inanimate systems, sometimes fashioned after our own image. In essence humans, as products of evolution, sometimes yearn to play the role of creator. This yearning permeates human literature in works ranging from Mary Shelley’s Frankenstein to Isaac Asimov’s I, Robot, which first codified the three laws of robotics. Indeed the nascent field of artificial intelligence (AI), often in collaboration with the fields of neuroscience, psychology and cognitive science, has made tremendous progress in creating machines that exhibit some human-like capabilities. In this post, I will explore a little further how AI, neuroscience, psychology and cognitive science, along with allied disciplines in the mathematical, physical and social sciences, have in the past, and will continue in the future, to work together in pursuit of the intertwined quest to understand and create intelligent systems.

A productive collaboration between the biological and artificial

Over the last 60 or so years, as AI developed, it was heavily influenced by and indeed inspired by neuroscience and psychology. In earlier decades, many AI practitioners were well studied in neuroscience and psychology. Here I provide a selection of past interactions between neuroscience, psychology and AI:

The very idea that distributed networks of relatively simple elements (neurons) can be capable of the remarkable computations underlying human intelligence originated in neuroscience and now permeates modern AI systems in the form of neural networks. This idea was not always obvious and it became firm only about a hundred years ago after the famous debates between Golgi and Cajal.

Various dimensionality reduction techniques including multi-dimensional scaling and factor analysis, were originally developed in the context of psychometrics research.

The famous neuroscientist Horace Barlow originated the idea of factorized codes, which in-turn inspired independent components analysis (ICA) and current research in AI aiming to disentangle independent factors of variation in data.

Tolman’s work on cognitive maps provided evidence that even rats form mental models of the world and can use these models to plan and navigate. This cemented the idea of internal model formation as a key component of animal intelligence, a component which currently lies at the frontiers of AI research.

The Hopfield network, a model in theoretical neuroscience that provided a unified framework for thinking about distributed, content-addressable memory storage and retrieval, also inspired the Boltzmann machine, which in turn provided a key first step in demonstrating the success of deep neural network models and inspired the idea of distributed satisfaction of many weak constraints as a model of computation in AI.

Critical ingredients underlying deep convolutional networks currently dominating machine vision were directly inspired by the brain. These ingredients include hierarchical visual processing in the ventral stream, suggesting the importance of depth; the discovery of retinotopy as an organizing principle throughout visual cortex, leading to convolution; the discovery of simple and complex cells motivating operations like max pooling; and the discovery of neural normalization within cortex, which motivated various normalization stages in artificial networks.

Seminal work on sparse coding as an attempt to understand the origin of oriented edge detectors in the primary visual cortex lead to sparse coding as a fundamental building block in modern AI systems.

Algorithms like temporal difference learning, which are now foundational in the field of reinforcement learning, were inspired by animal experiments on classical conditioning.

In turn, reinforcement learning had a dramatic impact on the interpretation of basal ganglia operation, in which dopaminergic neurons provide the basal ganglia with the all-important reward prediction error signal that drives learning in many reinforcement learning algorithms.

The modularity of memory systems in the brain inspired modern memory neural networks which separate to a degree the operations of memory storage and executive control circuitry that decides when to read and write from memory.

The human attentional system inspired the incorporation of attentional neural networks that can be trained to dynamically attend to or ignore different aspects of its state and inputs to make future computational decisions.

The development of formal generative grammars in linguistics and cognitive science, led to the development of probabilistic grammars and parsing in CS and AI.

Modern regularization techniques like dropout were inspired by the intrinsic stochasticity of neural dynamics.

Biological inspiration for the future of AI

Despite the remarkable commercial success of current AI systems on supervised pattern recognition tasks, we still have a long way to go in mimicking truly human like intelligence. Here I outline my personal view some of some directions in which the fields of biological and artificial intelligence may advance hand in hand going forward. These directions are by no means exhaustive, and on the HAI blog we will explore many more ideas in the future.

Biologically plausible credit assignment

The credit assignment problem is likely one of the biggest open questions, both in neuroscience and AI. Stated dramatically, suppose you are playing tennis and you saw you hit the ball incorrectly. Which one of your 100 trillion synapses are to blame? And how does the brain specifically find and correct the right set of synapses in your motor system, especially when the error is delivered through the visual system hundreds of milliseconds after the error occurred? In AI, this credit assignment problem is solved in many cases through backpropagation of error through multiple layers of computation. However, it is unclear how the brain solves this problem. What is true is that the brain solves it using a local learning rule: that is every synapse adjusts its strength using only information that is physically available to it, for example the electrical activity of the two neurons connected by the synapses, the strength of other synapses nearby, and any neuromodulatory inputs reflecting rewards and errors. Elucidating what such local synaptic rules are and how they work could have a dramatic impact in AI, leading to embarrassingly parallel implementations of learning on neuromorphic chips that avoid the communication overheads of backpropagation. But more generally, the very identification of a common unsolved problem plaguing both neuroscience and AI should motivate progress by bringing together synaptic physiologists, computational neuroscientists and AI practitioners to collectively crack the problem of biologically plausible credit assignment. Such a combination of experimental knowledge, theory, and engineering know-how is likely needed to successfully address this grand challenge.

Incorporating synaptic complexity

A major divergence between biological and artificial neural models lies in the very way we model synapses connecting neurons. In artificial networks, synapses are modelled by a single scalar value reflecting a multiplicative gain factor transforming how the presynaptic neuron’s input affects the postsynaptic neuron’s output. In contrast, every biological synapse has hiding within it immensely complicated molecular signaling pathways [1]. For example hippocampal synapses underlying our memory of recent events each contain a chemical reaction network of hundreds of different types of molecules capable of implementing an entire dynamical system with sophisticated temporal processing capabilities [2].

Upon seeing such complexity, a theorist or engineer may be tempted to simply ignore it as biological messiness arising as an accident of evolution. However, theoretical studies have shown that such synaptic complexity may indeed be essential to learning and memory [3]. In fact network models of memory in which synapses have finite dynamic range, require such synapses be dynamical systems in their own right with complex temporal filtering properties to achieve reasonable network memory capacities [4]. Moreover, more intelligent synapses have recently been explored [5] in AI as a way to solve the catastrophic forgetting problem, in which a network trained to learn two tasks in sequence can only learn the second task, because learning the second task changes synaptic weights in such a way as to erase knowledge gained from learning the first task.

More generally, it is likely that our current AI systems are leaving major performance gains on the table by ignoring the dynamical complexity of biological synapses. Just as we have added spatial depth to our networks to achieve complex hierarchical representations, we may also need to add dynamical depth to our synapses to achieve complex temporal learning capabilities.

Taking cues from systems-level modular brain architecture

Often, current commercial AI systems involve training networks with relatively homogenous layered or recurrent architectures starting from a tabula rasa of random weights. However, this may be too hard of a problem to solve for more complex tasks. Indeed biological evolution has taken a very different path. The last common ancestor of all vertebrates lived 500 million years ago. Its rudimentary brain has been evolving ever since, leading to the mammalian brain about 100 million years ago, and the human brain a few million years ago. This unbroken chain of evolution has lead to an intricate brain structure with highly conserved computational elements, and immense system level modularity. In fact we currently lack any engineering design principles that can explain how a complex sensing, communication, control and memory network like the brain can continuously scale in size and complexity over 500 million years while never losing the ability to adaptively function in dynamic environments. Thus it may be very interesting for AI to take cues from the systems-level structure of the brain.

One key systems property is modularity both at a functional and anatomical level. The brain is not homogenous like our current AI architectures, but has different modules, like the hippocampus (subserving episodic memory and navigation), the basal ganglia (underlying reinforcement learning and action selection), and the cerebellum (thought to automatize skilled motor control and higher level cognition through supervised learning). Moreover, memory systems (habitual memories, motor skills, short term memory, long-term memory, episodic memory, semantic memory) in the human brain are also functionally modular; different patients can have deficits in one type of memory without deficits in the others. Also, in the motor system nested feedback loop architectures predominate [6,7], with simple fast loops implementing automatic motor corrections in 20 ms through the spinal cord, slightly slower smarter loops implementing more sophisticated motor corrections over 50 ms through the motor cortex, and finally visual feedback flowing through the entire brain implementing conscious corrections of motor errors. Finally, a major feature of all mammalian brains is a neocortex consisting of a large number of relatively similar 6-layered cortical columns, all thought to implement variations on a single canonical computational module [8].

Overall, the remarkable modularity of the modern mammalian brain, conserved across species separated by 100 million years of independent evolution, suggests that this systems-level modularity might be beneficial to implement in AI systems (in functional principle, if not in biological detail) and that the current approach of training neural networks from a tabula rasa is likely an infeasible path towards more general human-like intelligence. Indeed, as an example, a combination of systems-level modularity (both anatomical and functional), nested loops which segregate different types of error correction, and more dynamically sophisticated synapses may all be critical ingredients in solving the grand challenge of biologically plausible credit assignment raised above.

Unsupervised learning, transfer learning and curriculum design

Another major discrepancy between AI systems and human-like learning lies in the vastly larger amounts of labelled data required of AI systems to even approach human-level performance. For example, a recent speech recognition system [9] was trained on 11,940 hours of speech with aligned transcriptions. If we both saw and heard another human read text aloud to us for two hours a day, it would take us 16 years to be exposed to that dataset. AlphaGo zero [10] practiced 4.9 million games of self play to beat human Go masters. If a human would play Go every day for 30 years, he or she would have to play 450 games a day to practice as much as AlphaGo zero. Also, a recent dataset on visual question answering [11] contains 0.25M images, 0.76M questions, and ~10M answers. If we received answers to 100 questions about images each day, it would take us 274 years to be exposed to a dataset of this size. It is clear in all three cases that humans receive vastly smaller amounts of labelled training data, yet they can recognize speech, play Go and answer questions about images pretty well.

Several keys to bridging this gap between artificial and biological intelligence lie in the human ability to learn from unlabelled data (unsupervised learning), as well as to build on strong prior knowledge gained from solving previous tasks, and to transfer that knowledge to new tasks (transfer learning). Finally, human society has set up systems of education that involve the design of carefully chosen sequences of tasks to facilitate knowledge acquisition (curriculum design). In order to efficiently instantiate these concepts in artificial systems, we need a deeper understanding and mathematical formalization of how both humans and other animals do unsupervised learning, how knowledge can be transferred between tasks [12,13], and how we can optimize curricula. Advances in these areas, which will require the interactions of computer scientists, psychologists, and educators, will likely be key to reducing the prohibitive data requirements of current AI systems. And they will be essential in empowering AI in other domains where labelled data is scarce.

Building world models for understanding, planning, and active causal learning

Much current AI success in commercial settings is achieved via supervised methods, where an AI system passively receives inputs, is told the correct output, and it adjusts its parameters to match each input-output combination. Babies in contrast behave like active scientists interrogating their environment [14]. Consider for example, the following experiment: through sleight of hand, a baby is shown two “magical” objects: Object A, which appears to move through walls, and object B, which does not fall when dropped. The baby is given both objects to play with. The baby will specifically attempt to push object A through solid surfaces, and drop object B to see if it will fall (and not the other way around). This remarkable experiment suggests that babies act like scientists who actively interrogate their world. In particular they: (1) already have an internal model of how the physical world should behave, (2) pay attention to events which violate that world model, and (3) perform active experiments to gather further data about these violations, thereby actively choosing their own training data based on their current world model.

Thus even babies, unlike most current commercial AI systems, have remarkable capabilities to learn and exploit world models. We need further research in both neuroscience and AI on learning world models from experience, using such world models to plan (i.e., imagine different futures contingent upon current actions), and use such future plans to make decisions. Such model-based planning and decision making is likely to be a powerful aid to current model-free reinforcement learning systems which simply map world states to values, or expected future rewards. This work in AI can advance hand in hand with work in neuroscience which reveals how neural activity in animals can relate to imagined as well as actualized futures [15]. Also, fundamental drives like curiosity can be formalized into reinforcement learning systems to facilitate learning and exploration [16]. More generally, a deep understanding of multiple systems and intrinsic biological drives that facilitate both animal and human learning is likely to be highly beneficial for speeding up learning in artificial systems.

Achieving energy-efficient computation in a post Moore’s law world

Another multiple order of magnitude discrepancy between biological and artificial systems lies in their energy expenditure. The human brain spends only 20 watts of power, while supercomputers operate in the megawatt range. In this sense, we are all literally dimmer than light bulbs! A key reason for this discrepancy likely lies in an over-reliance on digital computing itself. While the digital revolution has powered the rise of modern information technology, it may now be thought of as a suboptimal legacy technology in our forward-looking quest for achieving artificial intelligence. The reason is that digital computation requires flipping every bit at intermediate stages of a computation with extremely high reliability. However the laws of thermodynamics then exact a considerable energetic cost for every fast and reliable bit flip [17], thereby precluding high energy efficiency.

In contrast, biological computation using molecules within cells, as well as neurons within brains, looks astoundingly noisy and imprecise. However, every intermediate step of a biological computation is just reliable enough for the final answer to be just good enough. Moreover, the brain intelligently up or down regulates energy costs according to the desired speed of communication (something our mobile phone processors are just starting to do). For example, consider the cost of a single bit in the brain as it travels through a target neuron [18]. It starts off as the stochastic release of a vesicle, whose contents diffuse across the space between the source neuron and target neuron at a speed of 1 millimeter per second, burning only 2.3 femto-Joules (fJ). This slow speed is fine because the space between neuronal connections is only 20 nanometers. This chemical signal gets converted to a passive electrical signal that flows through the neuron cell body at a speed of 1 meter per second, burning 23 fJ to traverse about 10 micrometers. Finally, it reaches the axon terminal and gets converted to a spike, which travels 100 meters per second along the axon, burning 6000 fJ to travel 1 cm. Thus in going from chemical to passive electrical signalling, the brain dynamically upregulates communication speed by a factor of 1000 to traverse distances that increase by a factor of 1000, incurring a 10-fold increase in energy expenditure. Similarly, in going from passive to active electrical signalling, the brain increases communication speed by a factor of 100 to traverse distances that increase by a factor of 1000, incurring about a 200-fold increase in energy expenditure.

Thus the brain spends more energy only when more speed is actually needed and only when more reliability is required. In contrast, digital computers operate on a rigid synchronous clock, and at each clock tick many transistors must reliably flip state. In summary, the apparent chaos of biological computation need not be unavoidable messiness, but rather may reflect desirable principles of highly energy efficient design. To achieve such efficiency in our AI hardware, it may be essential to follow such principles of biological computation.

AI for neuroscience and neuroscience for AI: a virtuous scientific spiral

Recent exciting developments in the interaction between neuroscience and AI involve the development of deep and recurrent neural network models as models for different brain regions of animals performing tasks. This approach has achieved success for example in the ventral visual stream [19], auditory cortex [20], prefrontal cortex [21], motor cortex [22], and retina [23,24]. In many of these cases, when a deep or recurrent network is trained to solve a task, its internal representations look strikingly similar to the internal neural activity patterns measured in an animal trained to solve the same task. Thus we obtain often highly complex yet surprisingly veridical models of the operation of different brain regions during different tasks, raising a fundamental question: how do we understand what these models are doing and how they work? More precisely, how does the learned network connectivity and neural dynamics generate task performance? AI currently faces the same problem in understanding what it’s neural models are actually doing. While some engineers argue that it is not necessary to understand how neural networks work - it only matters that they do work well - it is nevertheless likely that a deeper scientific understanding of how the successes and failures of current networks arise out of their connectivity and dynamics will subsequently lead to improved networks. Indeed, hardly ever in the history of the interaction between science and technology has a deeper scientific understanding not lead to better technology. Moreover, in certain applications of AI, especially in medical diagnosis or law, explainable or interpretable AI is essential to widespread adoption. For example, doctors and judges would be loathe to use the recommendations of AI systems on their cases if they could not understand why these systems made the decisions they did.

Thus both neuroscience and AI have deeply shared scientific goals of understanding how network performance and decision making arises as an emergent property of network connectivity and dynamics. Therefore the development of ideas and theories from theoretical neuroscience, and applied physics and mathematics could help in analyzing AI systems. Moreover, the behavior of AI systems could change the nature of experimental design in neuroscience, focusing the experimental effort on those aspects of network function that are poorly understood in AI. Overall, there is much to be gained from tighter connections between neuroscience, AI, and many other theoretical disciplines, which could bring about unified laws for the emergence of intelligence in biological and artificial systems alike, as we suggest next.

Seeking universal laws governing both biological and artificial intelligence

An oft-quoted trope to argue for ignoring biology in the design of AI systems involves the comparison of planes to birds. After all, if we wish to create artificial machines that propel humans into the air, it now seems ridiculous to mimic biological ingredients like feathers and flapping wings in order to invent flying machines. However, a closer inspection of this idea reveals much more nuance. The general problem of flight involves solving two fundamental problems: (1) the generation of thrust in order to move forward, and (2) the generation of lift so that we do not fall out of the sky. Birds and planes do indeed solve the problem of thrust very differently; birds flap their wings and planes use jet engines. However, they solve the problem of lift in exactly the same way, by using a curved wing shape that generates higher air pressure below and lower air pressure above. Thus gliding birds and planes operate very similarly.

Indeed, we know that there are general physical laws of aerodynamics governing the motion of different shapes through air that yield computable methods for predicting generated forces like lift and thrust. Moreover, any solution to the problem of flight, no matter whether biological artificial, must obey the laws of aerodynamics. While there may be different viable solutions to the problem of flight under aerodynamic constraints, such solutions may share common properties (i.e., methods for generating lift), while simultaneously differing in other properties (i.e., methods for generating thrust). And finally, while on the subject of flight, there may yet be further engineering inspiration to be gleaned from the biological control laws implemented by the lowly fruit fly. Such flies are capable of rapid aerial maneuvers that far outstrip the capabilities of the world’s most sophisticated fighter jets.

More generally, in our study of the physical world, we are used to the notion that there exist principles or laws governing its behavior. For example, just as aerodynamics governs the motion of flying objects, general relativity governs the curvature of space and time, and quantum mechanics governs the evolution of the nanoworld. We believe that there may also exist general principles, or laws that govern how intelligent behavior can emerge from the cooperative activity of large interconnected networks of neurons. These laws could connect and unify the related disciplines of neuroscience, psychology, cognitive science and AI, and their elucidation would also require help from (as well as contribute to the development of) analytic and computational fields like physics, mathematics and statistics. Indeed the author of this post has used techniques from dynamical systems theory [25–28], statistical mechanics [29–33], Riemannian geometry [34], random matrix theory [13,35], and free probability theory [36] to obtain conceptual insights into the operation of biological and artificial networks alike. However, to elucidate general laws and design principles governing the emergence of intelligence from nonlinear distributed circuits will require much further work, including the development of new concepts, analysis methods, and engineering capabilities. Ultimately, just like the story of birds, planes and aerodynamics, there may be diverse solutions to the problem of creating intelligent machines, with some components shared between biological and artificial solutions, while others may differ. By seeking general laws of intelligence, we could more efficiently understand and traverse this solution space.

Creating a nurturing academic environment at Stanford through the human-centered AI initiative

Discovering potential laws of emergent intelligence applicable to both biological and artificial systems alike, and building new types of AI inspired by neuroscience and psychology, requires the concerted effort of many investigators working together: computer scientists and engineers in pursuit of better AI systems, neuroscientists, psychologists and cognitive scientists probing the properties of the brain and mind, and mathematicians, physicists, statisticians, and other theorists seeking to formalize our combined knowledge and discover general laws and principles. In essence we need to create a new community of researchers traversing these disparate disciplines and freely exchanging ideas, insulated from the pressure of generating short term research results prevalent under both government grant funding mechanisms and industry funding models. We also need to train a next generation of students and thought leaders who are cognizant of techniques and knowledge across many fields, putting together in some sense parts of the computer scientist, neurobiologist, psychologist, and mathematical theorist in the same brain.

These are the goals of one focus area of our newly formed human-centered AI initiative: generating new AI systems inspired by human intelligence. Our initiative will work closely with existing centers and institutes of excellence related to this mission, including the Stanford Artificial Intelligence Lab, the Wu Tsai Neuroscience Institute, the Center for Mind Brain Computation and Technology, the Stanford Institute for Theoretical Physics and the Information Systems Lab, among many others. Moreover we will draw from the expertise of leading Stanford academic departments and programs, including (but not limited to) computer science, electrical engineering, neuroscience, psychology, linguistics, philosophy, education, mathematics, physics, and statistics. By creating and nurturing a new community of interdisciplinary scholars, HAI at Stanford aims to catalyze the intertwined quest for understanding biological intelligence and creating artificial intelligence, which may well be one of the most exciting intellectual activities of this century and beyond.

References