Research in neural networks began in 1958 with Frank Rosenblatt’s perceptron model (Rosenblatt 1958). Rosenblatt was a US Navy psychologist, and the perceptron algorithm he developed was explicitly inspired by a mathematical idealization of the neuron, the brain’s most basic information processing unit. Biological neurons are connected by synapses that enable communication through electrical or chemical signals. Building on this idea, Rosenblatt proposed a model architecture in which input features are mapped to outputs through an intermediate layer of neurons (see Fig. 2). The weights connecting these components are analogous to the synaptic strength of incoming and outgoing channels. At the output layer, values are passed through a nonlinear activator function to mimic the thresholding effect of biological neurons, which respond to stimuli by either firing or not firing.

Fig. 2 (From Hastie et al. (2009), p. 393) Schematic depiction of a single-layered neural network. Input features X are combined at each neuron Z, which in turn combine to produce predictions Y Full size image

Neural networks have evolved considerably since Rosenblatt first published his perceptron model. Modern variants of the algorithm tend to include many more layers—thence the name deep neural networks (DNNs)—an approach inspired at least in part by anatomical research. In their influential study of the cat visual cortex, Hubel and Wiesel (1962) differentiated between so-called “simple” cells, which detect edges and curves, and “complex” cells, which combine simple cells to identify larger shapes with greater spatial invariance. The authors hypothesized that a hierarchy of neural layers could enable increasingly complex cognition, allowing the brain to operate at higher levels of abstraction. DNNs implement this theory at scale. Employing complex convolutional architectures (Krizhevsky et al. 2012) and clever activation functions (Glorot et al. 2011), DNNs have led the latest wave of excitement about and funding for AI research. Descendants of the perceptron algorithm now power translation services for Google (Wu et al. 2016), facial recognition software for Facebook (Taigman et al. 2014), and virtual assistants like Apple’s Siri (Siri Team 2017).

The biomimetic approach to AI has always inspired the popular imagination. Writing about Rosenblatt’s perceptron, the New York Times declared in 1958 that “The Navy has revealed the embryo of an electronic computer today that it expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence” (New York Times1958, p. 25). The exuberance has only been somewhat tempered by the intervening decades. The same newspaper recently published a piece on DeepMind’s AlphaZero, a DNN that is the reigning world champion of chess, shogi, and Go (Silver et al. 2018). In the essay, Steven Strogatz describes the algorithm in almost breathless language:

Most unnerving was that AlphaZero seemed to express insight. It played like no computer ever has, intuitively and beautifully, with a romantic, attacking style. It played gambits and took risks…. AlphaZero had the finesse of a virtuoso and the power of a machine. It was humankind’s first glimpse of an awesome new kind of intelligence. (Strogatz 2018)

Excitement about DNNs is hardly limited to the popular press. (Strogatz, it should be noted, is a professor of mathematics.) Some leading researchers in deep learning have suggested that the anthropomorphic connection in fact runs both ways, proposing that “neural networks from AI can be used as plausible simulacra of biological brains, potentially providing detailed explanations of the computations occurring therein” (Hassabis et al. 2017, p. 254).

Indeed, this is more or less the central tenet of connectionism, a decades-old movement in cognitive science and philosophy that has seen a renaissance with the recent success of deep learning (Buckner and Garson 2019). DNNs have been used to model information processing at various stages of the visual cortex of human and nonhuman primates (Cichy et al. 2016; Kriegeskorte 2015; Yamins and DiCarlo 2016), achieving state of the art predictive performance while simultaneously suggesting novel subcortical functions. Stinson (2016) reviews a number of epistemological advantages of connectionism, which she maintains is unique among computational models in its ability to reveal generic mechanisms in the brain. At least one philosopher has argued that DNNs instantiate a mode of “transformational abstraction” that resolves longstanding debates between rationalists and empiricists (Buckner 2018).

There is no denying that the achievements of AlphaZero and other top performing DNNs are impressive. But a large and growing strain of literature in computational statistics has recently emphasized the limitations of these algorithms, which deviate from human modes of learning in several fundamental and alarming ways. A complete list of the differences between DNNs and biological neural networks would be too long to enumerate here. See (Marcus 2018) for a good overview. Instead I will highlight three especially noteworthy dissimilarities that underscore the shortcomings of this paradigm, which I argue has been vastly overhyped since DNNs first attained state of the art performance in speech recognition (Dahl et al. 2012; Mohamed et al. 2012; Raina et al. 2009) and image classification tasks (Krizhevsky et al. 2012; LeCun et al. 2015; Lecun et al. 1998). These results notwithstanding, there is good reason to believe that, compared to human brains, DNNs are brittle, inefficient, and myopic in specific senses to be explained below.

DNNs tend to break down in the face of minor attacks. In a landmark paper, Goodfellow et al. (2014) introduced generative adversarial networks (GANs), a new class of DNNs designed to fool other DNNs through slight perturbations of the input features. For instance, by adding just a small amount of noise to the pixels of a photograph, Goodfellow et al. (2015) were able to trick the high-performing ImageNet classifier into mislabeling a panda as a gibbon, even though differences between the two images are imperceptible to the human eye (see Fig. 3). Others have fooled DNNs into misclassifying zebras as horses (Zhu et al. 2017), bananas as toasters (Brown et al. 2017), and many other absurd combinations. While GANs were originally viewed as something of a curiosity in the deep learning community, they have since been widely recognized as a profound challenge that may undermine the applicability of DNNs to safety–critical areas such as clinical medicine (Finlayson et al. 2019) or autonomous vehicles (Eykholt et al. 2018). Needless to say, humans are much more resilient to minor perturbations of our sensory stimuli. This disconnect between biological and artificial neural networks suggests that the latter lack some crucial component essential to navigating the real world.

Fig. 3 Example of an adversarial perturbation from Goodfellow et al. (2015), p. 3 Full size image

Recent work on GANs has complicated this conclusion somewhat. Elsayed et al. (2018) have shown that adversarial attacks negatively influence the predictive performance of time-limited humans. In a much larger study, Zhou and Firestone (2019) found that participants were able to decipher a wide array of adversarial examples. The reason may have to do with the generalizability of certain visual perturbations. Ilyas et al. (2019) demonstrate that attacks designed to fool one DNN often succeed in fooling others trained independently. The authors infer from this that adversarial examples are features, not bugs—i.e., that they encode true information about “non-robust” properties of the input data that may be incomprehensible to humans. Their work has sparked intense interest among machine learning researchers—see (Engstrom et al. 2019) for a discussion—but it is not immediately clear what lessons are to be drawn for the connectionist. For even if GANs do reveal some otherwise imperceptible reality about the underlying geometry of visual data, the fact remains that those representations are largely inaccessible to humans. Zhou and Firestone attempt to show the opposite, but they specifically rule out attacks of the sort considered above, in which an image is misclassified through thousands of minor perturbations. Some ability to distinguish between what Ilyas et al. call “robust” and “non-robust” features—a distinction they acknowledge is inescapably anthropocentric—still appears essential.

Another important flaw with DNNs is that they are woefully data inefficient. High-performing models typically need millions of examples to learn distinctions that would strike a human as immediately obvious. Geoffrey Hinton, one of the pioneers of DNNs and a recent recipient of the ACM’s prestigious Turing Award for excellence in computing, has raised the issue himself in interviews. “For a child to learn to recognize a cow,” he remarked, “it’s not like their mother needs to say ‘cow’ 10,000 times” (Waldrop 2019). Indeed, even very young humans are typically capable of one-shot learning, generalizing from just a single instance. This is simply impossible for most DNNs, a limitation that is especially frustrating in cases where abundant, high-quality data are prohibitively expensive or difficult to collect. Gathering large volumes of labelled photographs is not especially challenging, but comparable datasets for genetics or particle physics are another matter altogether.

Reinforcement learning arguably poses a clever workaround to this problem, in which synthetic data are generated as part of the training process (Sutton and Barto 2018). However, this approach is constrained by our ability to simulate realistic data for the target system. Preprocessing strategies have been developed for data augmentation in image recognition tasks (Perez and Wang 2017), but again these are not universally applicable. More importantly, neither solution addresses the underlying issue—we want models to learn more with less data, not generate their own data so they can continue in their profligate ways. A more explicitly biomimetic approach would be to develop memory augmented systems, and several labs have made good progress in this area (Collier and Beel 2018; Graves et al. 2016; Vinyals et al. 2016). Unfortunately, these models often fail during training or are very slow to converge, which explains why they have only been implemented for relatively simple tasks to date. Promising though these strands of research may be, one-shot learning remains a significant challenge for DNNs.

A final important difference between human cognition and deep learning is that the latter has proven itself to be strangely myopic. The problem is most evident in the case of image classification. Careful analysis of the intermediate layers of convolutional DNNs reveals that whereas the lowest level neurons deal in pixels, higher level neurons operate on more meaningful features like eyes and ears, just as Hubel and Wiesel hypothesized (Olah et al. 2018). Yet even top performing models can learn to discriminate between objects while completely failing to grasp their interrelationships. For instance, rearranging Kim Kardashian’s mouth and eye in Fig. 4 actually improved the DNN’s prediction, indicating something deeply wrong with the underlying model, which performs well on out-of-sample data (Bourdakos 2017).

Fig. 4 (From Bourdakos (2017)) Predictions from a convolutional DNN on two images of Kim Kardashian. Alarmingly, rearranging her facial features does not adversely affect the model’s prediction Full size image

Zhou and Firestone (2019) hypothesize that the alleged myopia problem is just a byproduct of the requirement that DNNs select a label from a constrained choice set. They write:

Whereas humans have separate concepts for appearing like something vs. appearing to be that thing—as when a cloud looks like a dog without looking like it is a dog…[DNNs] are not permitted to make this distinction, instead being forced to play the game of picking whichever label in their repertoire best matches an image… (p. 8)

This explanation goes some way toward explaining the perplexing results of the perturbed Kim Kardashian image in Fig. 4. However, the true problem runs deeper than Zhou and Firestone suggest. Hinton argues that myopia is hardwired into convolutional DNNs via the max pooling function, which compresses the information between layers (Hinton et al. 2011). Max pooling discards valuable spatial information that humans use to identify and interact with objects, losing all semblance of structural hierarchies in the process. Thus any combination of eyes, nose, and mouth will suffice for a convolutional DNN—not because of external constraints on the choice set, but because of intrinsic limitations of the model architecture. Hinton et al. recently proposed a new algorithm called capsule networks in an effort to overcome these deficiencies (Hinton et al. 2018; Sabour et al. 2017), but the technology is still in its infancy (Fig. 5).

Fig. 5 Example of the “lasso path” of model coefficients in a linear regression. All weights converge toward zero as the penalty parameter increases. Adapted from the glmnet package vignette (Hastie and Qian 2014) Full size image

The problems of algorithmic brittleness, inefficiency, and myopia are not unique to DNNs—although these models are perhaps the worst offenders on all fronts—nor do they undermine the central premise of connectionism, a bold and fruitful theory that has generated much valuable research in AI, cognitive science, philosophy of mind. What these objections do establish, however, is that the ostensible affinities between biological brains and modern DNNs should be treated with skepticism. The anthropomorphic hype around deep learning is uncritical and overblown. It would be a mistake to say that these algorithms recreate human intelligence; instead, they introduce some new mode of inference that outperforms us in some ways and falls short in others.

Often lost in the excitement surrounding DNNs is the fact that other approaches to machine learning exist, many with considerable advantages over neural networks on a wide range of tasks. The next three sections are devoted to several such methods, with an emphasis on their epistemological underpinnings and anthropomorphic connections.