Modern computing architecture based on the separation of memory and processing leads to a well known problem called the von Neumann bottleneck, a restrictive limit on the data bandwidth between CPU and RAM. This paper introduces a new approach to computing we call AHaH computing where memory and processing are combined. The idea is based on the attractor dynamics of volatile dissipative electronics inspired by biological systems, presenting an attractive alternative architecture that is able to adapt, self-repair, and learn from interactions with the environment. We envision that both von Neumann and AHaH computing architectures will operate together on the same machine, but that the AHaH computing processor may reduce the power consumption and processing time for certain adaptive learning tasks by orders of magnitude. The paper begins by drawing a connection between the properties of volatility, thermodynamics, and Anti-Hebbian and Hebbian (AHaH) plasticity. We show how AHaH synaptic plasticity leads to attractor states that extract the independent components of applied data streams and how they form a computationally complete set of logic functions. After introducing a general memristive device model based on collections of metastable switches, we show how adaptive synaptic weights can be formed from differential pairs of incremental memristors. We also disclose how arrays of synaptic weights can be used to build a neural node circuit operating AHaH plasticity. By configuring the attractor states of the AHaH node in different ways, high level machine learning functions are demonstrated. This includes unsupervised clustering, supervised and unsupervised classification, complex signal prediction, unsupervised robotic actuation and combinatorial optimization of procedures–all key capabilities of biological nervous systems and modern machine learning algorithms with real world application.

Competing interests: The authors of this paper have a financial interest in the technology derived from the work presented in this paper. Patents include the following: US6889216, Physical neural network design incorporating nanotechnology; US6995649, Variable resistor apparatus formed utilizing nanotechnology; US7028017, Temporal summation device utilizing nanotechnology; US7107252, Pattern recognition utilizing a nanotechnology-based neural network; US7398259, Training of a physical neural network; US7392230, Physical neural network liquid state machine utilizing nanotechnology; US7409375, Plasticity-induced self organizing nanotechnology for the extraction of independent components from a data stream; US7412428, Application of hebbian and anti-hebbian learning to nanotechnology-based physical neural networks; US7420396, Universal logic gate utilizing nanotechnology; US7426501, Nanotechnology neural network methods and systems; US7502769, Fractal memory and computational methods and systems based on nanotechnology; US7599895, Methodology for the configuration and repair of unreliable switching elements; US7752151, Multilayer training in a physical neural network formed utilizing nanotechnology; US7827131, High density synapse chip using nanoparticles; US7930257, Hierarchical temporal memory utilizing nanotechnology; US8041653, Method and system for a hierarchical temporal memory utilizing a router hierarchy and hebbian and anti-hebbian learning; US8156057, Adaptive neural network utilizing nanotechnology-based components. Additional patents are pending. Authors of the paper are owners of the commercial companies performing this work. Companies include the following: Cover Letter; KnowmTech LLC, Intellectual Property Holding Company: Author Alex Nugent is a Co-owner; M. Alexander Nugent Consulting, Research and Development: Author Alex Nugent is owner and Tim Molter employee; Xeiam LLC, Technical Architecture: Authors Tim Molter and Alex Nugent are co-owners. Products resulting from the technology described in this paper are currently being developed. This does not alter the authors’ adherence to all the PLOS ONE policies on sharing data and materials. The authors agree to make freely available any materials and data described in this publication that may be reasonably requested for the purpose of academic, non-commercial research. As part of this, the authors have open-sourced all code and data used to generated the results of this paper under a “M. Alexander Nugent Consulting Research License”.

Funding: This work has been supported in part by the Air Force Research Labs (AFRL) and Navy Research Labs (NRL) under the SBIR/STTR programs AF10-BT31, AF121-049 and N12A-T013 ( http://www.sbir.gov/about/about-sttr ; http://www.sbir.gov/# ). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2014 Nugent, Molter. This is an open-access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

In 2008, HP Laboratories announced the production of Chua’s postulated electronic device, the memristor [48] and explored their use as synapses in neuromorphic circuits [49] . Several memristive devices were previously reported by this time, predating HP Laboratories [50] – [54] , but they were not described as memristors. In the same year, Hylton and Nugent launched the Systems of Neuromorphic Adaptive Plastic Scalable Electronics (SyNAPSE) program with the goal of demonstrating large scale adaptive learning in integrated memristive electronics at biological scale and power. Since 2008 there has been an explosion of worldwide interest in memristive devices [55] – [59] device models [60] – [65] , their connection to biological synapses [66] – [72] , and use in alternative computing architectures [73] – [84] .

In 2004, Nugent et al. showed how the AHAH plasticity rule is derived via the minimization of a kurtosis objective function and used as the basis of self-organized fault tolerance in support vector machine network classifiers. Thus, the connection that margin maximization coincides with independent component analysis and neural plasticity was demonstrated [43] , [44] . In 2006, Nugent first detailed how to implement the AHaH plasticity rule in memristive circuitry and demonstrated that the AHaH attractor states can be used to configure a universal reconfigurable logic gate [45] – [47] .

At roughly the same time, the theory of support vector maximization emerged from earlier work on statistical learning theory from Vapnik and Chervonenkis and has become a generally accepted solution to the generalization versus memorization problem in classifiers [12] , [42] .

Beinenstock, Cooper and Munro published a theory of synaptic modification in 1982 [37] . Now known as the BCM plasticity rule, this theory attempts to account for experiments measuring the selectivity of neurons in primary sensory cortex and its dependency on neuronal input. When presented with data from natural images, the BCM rule converges to selective oriented receptive fields. This provides compelling evidence that the same mechanisms are at work in cortex, as validated by the experiments of Hubel and Wiesel. In 1989 Barlow reasoned that such selective response should emerge from an unsupervised learning algorithm that attempts to find a factorial code of independent features [38] . Bell and Sejnowski extended this work in 1997 to show that the independent components of natural scenes are edge filters [39] . This provided a concrete mathematical statement on neural plasticity: Neurons modify their synaptic weight to extract independent components. Building a mathematical foundation of neural plasticity, Oja and collaborators derived a number of plasticity rules by specifying statistical properties of the neuron’s output distribution as objective functions. This lead to the principle of independent component analysis (ICA) [40] , [41] .

VLSI pioneer Mead published with Conway the landmark text Introduction to VLSI Systems in 1980 [35] . Mead teamed with John Hopfield and Feynman to study how animal brains compute. This work helped to catalyze the fields of Neural Networks (Hopfield), Neuromorphic Engineering (Mead) and Physics of Computation (Feynman). Mead created the world’s first neural-inspired chips including an artificial retina and cochlea, which was documented in his book Analog VLSI Implementation of Neural Systems published in 1989 [36] .

In 1971, Chua postulated on the basis of symmetry arguments the existence of a missing fourth two terminal circuit element called a memristor (memory resistor), where the resistance of the memristor depends on the integral of the input applied to the terminals [33] , [34] .

In 1969, the initial excitement with perceptrons was tampered by the work of Minsky and Papert, who analyzed some of the properties of perceptrons and illustrated how they could not compute the XOR function using only local neurons [29] . The reaction to Minsky and Papert diverted attention away from connection networks until the emergence of a number of new ideas, including Hopfield networks (1982) [30] , back propagation of error (1986) [31] , adaptive resonance theory (1987) [32] , and many other permutations. The wave of excitement in neural networks began to fade as the key problem of generalization versus memorization became better appreciated and the computing revolution took off.

In 1960, Widrow and Hoff developed ADALINE, a physical device that used electrochemical plating of carbon rods to emulate the synaptic elements that they called memistors [28] . Unlike memristors, memistors are three terminal devices, and their conductance between two of the terminals is controlled by the time integral of the current in the third. This work represents the first integration of memristive-like elements with electronic feedback to emulate a learning system.

In 1953, Barlow discovered neurons in the frog brain fired in response to specific visual stimuli [26] . This was a precursor to the experiments of Hubel and Wiesel who showed in 1959 the existence of neurons in the primary visual cortex of the cat that selectively responds to edges at specific orientations [27] . This led to the theory of receptive fields where cells at one level of organization are formed from inputs from cells in a lower level of organization.

In 1949, only one year after Turing wrote ‘Intelligent machinery’, synaptic plasticity was proposed as a mechanism for learning and memory by Hebb [24] . Ten years later in 1958 Rosenblatt defined the theoretical basis of connectionism and simulated the perceptron, leading to some initial excitement in the field [25] .

In 1944, physicist Schrödinger published the book What is Life? based on a series of public lectures delivered at Trinity College in Dublin. Schrödinger asked the question: “How can the events in space and time which take place within the spatial boundary of a living organism be accounted for by physics and chemistry?” He described an aperiodic crystal that predicted the nature of DNA, yet to be discovered, as well as the concept of negentropy being the entropy of a living system that it exports to keep its own entropy low [23] .

In 1936, Turing, best known for his pioneering work in computation and his seminal paper ‘On computable numbers’ [21] , provided a formal proof that a machine could be constructed to be capable of performing any conceivable mathematical computation if it were representable as an algorithm. This work rapidly evolved to become the computing industry of today. Few people are aware that, in addition to the work leading to the digital computer, Turing anticipated connectionism and neuron-like computing. In his paper ‘Intelligent machinery’ [22] , which he wrote in 1948 but was not published until well after his death in 1968, Turing described a machine that consists of artificial neurons connected in any pattern with modifier devices. Modifier devices could be configured to pass or destroy a signal, and the neurons were composed of NAND gates that Turing chose because any other logic function can be created from them.

Not only does it make physical sense to build large scale adaptive systems from volatile components but furthermore there is no supporting evidence to suggest it is possible to do the contrary. A brain is a volatile dissipative out-of-equilibrium structure. It is therefore reasonable that a volatile solution to machine learning at low power and high densities exists. The goal of AHaH computing is to find and exploit this solution.

In the non-volatile case some process external to the switch (i.e. an algorithm on a CPU) must provide the energy needed to effect the state transition. In the volatile case an external process must stop providing the energy needed for state repair. These two antisymmetric conditions can be summarized as: “Stability for free, adaptation for a price” and “adaptation for free, stability for a price”, respectively.

A volatile switch on the other hand cannot be read without damaging its state. Each read operation lowers the switch barriers and increases the probability of random state transitions. Accumulated damage to the state must be actively repaired. In the absence of repair, the act of reading the state is alone sufficient to induce state transitions. The distance that must be traversed between memory and processing of an adaptation event goes to zero as the system becomes intrinsically adaptive. The act of accessing the memory becomes the act of configuring the memory.

In the non-volatile case, sufficient energy must be applied to overcome the barrier potential. Energy must be dissipated in proportion to the barrier height once a switching event takes place. Rather than just the switch, it is also the electrode leading to the switch that must be raised to the switch barrier energy. As the number of adaptive variables increases, the power required to sustain the switching events scales as the total distance needed to communicate the switching events and the square of the voltage.

Consider two switches, one non-volatile and the other volatile. Furthermore, consider what it takes to change the state of each of these switches, which is the most fundamental act of adaptation or reconfiguration. Abstractly, a switch can be represented as a potential energy well with two or more minima.

At the core of the adaptive power problem is the energy wasted during memory–processor communication. The ultimate solution to the problem entails finding ways to let memory configure itself, and AHaH computing is one such method.

As an example, consider that IBM’s recent cat-scale cortical simulation of 1 billion neurons and 10 trillion synapses [18] required 147,456 CPUs, 144 TB of memory, running at real-time. At a power consumption of 20 W per CPU, this is 2.9 MW. Under perfect scaling, a real-time simulation of a human-scale cortex would dissipate over 7 GW of power. The number of adaptive variables under constant modification in the IBM simulation is orders of magnitude less than the biological counterpart and yet its power dissipation is orders of magnitude larger. Another example from Google to train neural networks on YouTube data roughly doubled the accuracy from previous attempts [19] . The effort took an array of 16,000 CPU cores working at full capacity for 3 days. The model contained 1 billion connections, which although impressive pales in comparison to biology. The average human neocortex contains 150,000 billion connections [20] and the number of synapses in the neocortex is a fraction of the total number of connections in the brain. At 20 W per core, Google’s simulation consumed about 320 kW. Under perfect scaling, a human-scale simulation would dissipate 48 GW of power.

Through constant dissipation of free energy, living systems continuously repair their seemingly fragile state. A byproduct of this condition is that living systems are intrinsically adaptive at all scales, from cells to ecosystems. This presents a difficult challenge when we attempt to simulate such large scale adaptive networks with modern von Neumann computing architectures. Each adaptation event must necessarily reduce to memory–processor communication as the state variables are modified. The energy consumed in shuttling information back and forth grows in line with the number of state variables that must be continuously modified. For large scale adaptive systems like the brain, the inefficiencies become so large as to make simulations impractical.

Our goal is to lay a foundation for a new type of practical computing based on the configuration and repair of volatile switching elements. We traverse the large gap from volatile memristive devices to demonstrations of computational universality and machine learning. The reader should keep in mind that the subject matter in this paper is necessarily diverse, but is essentially an elaboration of these three points:

A brain, like all living systems, is a far-from-equilibrium energy dissipating structure that constantly builds and repairs itself. We can shift the standard question from “how do brains compute?” or “what is the algorithm of the brain?” to a more fundamental question of “how do brains build and repair themselves as dissipative attractor-based structures?” Just as a ball will roll into a depression, an attractor-based system will fall into its attractor states. Perturbations (damage) will be fixed as the system reconverges to its attractor state. As an example, if we cut ourselves we heal. To bestow this property on our computing technology we must find a way to represent our computing structures as attractors. In this paper we detail how the attractor points of a plasticity rule we call Anti-Hebbian and Hebbian (AHaH) plasticity are computationally complete logic functions as well as building blocks for machine learning functions. We further show that AHaH plasticity can be attained from simple memristive circuitry attempting to maximize circuit power dissipation in accordance with ideas in nonequilibrium thermodynamics.

How does nature compute? Attempting to answer this question naturally leads one to consider biological nervous systems, although examples of computation abound in other manifestations of life. Some examples include plants [1] – [5] , bacteria [6] , protozoan [7] , and swarms [8] , to name a few. Most attempts to understand biological nervous systems fall along a spectrum. One end of the spectrum attempts to mimic the observed physical properties of nervous systems. These models necessarily contain parameters that must be tuned to match the biophysical and architectural properties of the natural model. Examples of this approach include Boahen’s neuromorphic circuit at Stanford University and their Neurogrid processor [9] , the mathematical spiking neuron model of Izhikevich [10] as well as the large scale modeling of Eliasmith [11] . The other end of the spectrum abandons biological mimicry in an attempt to algorithmically solve the problems associated with brains such as perception, planning and control. This is generally referred to as machine learning. Algorithmic examples include support vector maximization [12] , k-means clustering [13] and random forests [14] . Many approaches fall somewhere along the spectrum between mimicry and machine learning, such as the CAVIAR [15] and Cognimem [16] neuromorphic processors as well as IBM’s neurosynaptic core [17] . In this paper we consider an alternative approach outside of the typical spectrum by asking ourselves a simple but important question: How can a brain compute given that it is built of volatile components?

Theory

On the Origins of Algorithms and the 4th Law of Thermodynamics Turing spent the last two years of his life working on mathematical biology and published a paper titled ‘The chemical basis of morphogenesis’ in 1952 [85]. Turing was likely struggling with the concept that algorithms represent structure, brains and life in general are clearly capable of creating such structure, and brains are ultimately a biological chemical process that emerge from chemical homogeneity. How does complex spatial-temporal structure such as an algorithm emerge from the interaction of a homogeneous collection of units? Answering this question in a physical sense leads one straight into the controversial 4th law of thermodynamics. The 4th law is is attempting to answer a simple question with profound consequences if a solution is found: If the 2nd law says everything tends towards disorder, why does essentially everything we see in the Universe contradict this? At almost every scale of the Universe we see self-organized structures, from black holes to stars, planets and suns to our own earth, the life that abounds on it and in particular the brain. Non-biological systems such as Benard convection cells [86], tornadoes, lightning and rivers, to name just a few, show us that matter does not tend toward disorder in practice but rather does quite the opposite. In another example, metallic spheres in a non-conducting liquid medium exposed to an electric field will self-organize into fractal dendritic trees [87]. One line of argument is that ordered structures create entropy faster than disordered structures do and self-organizing dissipative systems are the result of out of equilibrium thermodynamics. In other words, there may not actually be a distinct 4th law, and all observed order may actually result from dynamics yet to be unraveled mathematically from the 2nd law. Unfortunately this argument does not leave us with an understanding sufficient to allow us to exploit the phenomena in our technology. In this light, our work with AHaH attractor states may provide a clue as to the nature of the 4th law in so much as it lets us construct useful self-organizing and adaptive computing systems. One particularly clear and falsifiable formulation of the 4th law comes from Swenson in 1989: “A system will select the path or assembly of paths out of available paths that minimizes the potential or maximizes the entropy at the fastest rate given the constraints [88].” Others have converged on similar thoughts. For example, Bejan postulated in 1996 that: “For a finite-size system to persist in time (to live), it must evolve in such a way that it provides easier access to the imposed currents that flow through it [89].” Bejan’s formulation seems intuitively correct when one looks at nature, although it has faced criticism that it is too vague since it does not say what particle is flowing. We observe that in many cases the particle is either directly a carrier of free energy dissipation or else it gates access, like a key to a lock, to free energy dissipation of the units in the collective. These particles are not hard to spot. Examples include water in plants, ATP in cells, blood in bodies, neurotrophins in brains, and money in economies. More recently, Jorgensen and Svirezhev have put forward the maximum power principle [90] and Schneider and Sagan have elaborated on the simple idea that “nature abhors a gradient” [91]. Others have put forward similar notions much earlier. Morowitz claimed in 1968 that the flow of energy from a source to a sink will cause at least one cycle in the system [91] and Lotka postulated the principle of maximum energy flux in 1922 [92].

The Container Adapts Hatsopoulos and Keenan’s law of stable equilibrium [93] states that: “When an isolated system performs a process, after the removal of a series of internal constraints, it will always reach a unique state of equilibrium; this state of equilibrium is independent of the order in which the constraints are removed.” The idea is that a system erases any knowledge about how it arrived in equilibrium. Schneider and Sagan state this observation in their book Into the Cool: Energy Flow, Thermodynamics, and Life [91] by claiming: “These principles of erasure of the path, or past, as work is produced on the way to equilibrium hold for a broad class of thermodynamic systems.” This principle has been illustrated by connected rooms, where doors between the rooms are opened according to a particular sequence, and only one room is pressurized at the start. The end state is the same regardless of the path taken to get there. The problem with this analysis is that it relies on an external agent: the door opener. We may reformulate this idea in the light of an adaptive container, as shown in Figure 1. A first replenished pressurized container is allowed to diffuse into two non-pressurized empty containers and though a region of matter . Let us presume that the initial fluid conductance between and is less than . Competition for limited resources within the matter (conservation of matter) enforces the condition that the sum of conductances is constant: (1) PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 1. AHaH process. A) A first replenished pressurized container is allowed to diffuse into two non-pressurized empty containers and though a region of matter M. B) The gradient reduces faster than the gradient due to the conductance differential. C) This causes to grow more than , reducing the conductance differential and leading to anti-Hebbian learning. D) The first detectable signal (work) is available at owing to the differential that favors it. As a response to this signal, events may transpire in the environment that open up new pathways to particle dissipation. The initial conductance differential is reinforced leading to Hebbian learning. https://doi.org/10.1371/journal.pone.0085175.g001 Now we ask how the container adapts as the system attempts to come to equilibrium. If it is the gradient that is driving the change in the conductance, then it becomes immediately clear that the container will adapt in such a way as to erase any initial differential conductance: (2) The gradient will reduce faster than the gradient and will grow more than . When the system comes to equilibrium we will find that the conductance differential, has been reduced. The sudden pressurization of may have an effect on the environment. In the moments right after the flow sets up, the first detectable signal (work) will be available at owing to the differential that favors it. As a response to this signal, any number of events could transpire in the environment that open up new pathways to particle dissipation. The initial conductance differential will be reinforced as the system rushes to equalize the gradient in this newly discovered space. Due to conservation of adaptive resources (Equation 1), an increase in will require a drop in , and vice versa. The result is that as , , and the system selects one pathway over another. The process illustrated in Figure 1 creates structure so long as new sinks are constantly found and a constant particle source is available. We now map this thermodynamic process to anti-Hebbian and Hebbian (AHaH) plasticity and show that the resulting attractor states support universal algorithms and broad machine learning functions. We furthermore show how AHaH plasticity can be implemented via physically adaptive memristive circuitry.

Anti-Hebbian and Hebbian (AHaH) Plasticity The thermodynamic process outlined above can be understood more broadly as: (1) particles spread out along all available pathways through the environment and in doing so erode any differentials that favor one branch over the other, and (2) pathways that lead to dissipation (the flow of the particles) are stabilized. Let us first identify a synaptic weight, , as the differential conductance formed from two energy dissipating pathways: (3) We can now see that the synaptic weight possess state information. If the synapse is positive and if then it is negative. With this in mind we can explicitly define AHaH learning: Anti-Hebbian (erase the path): Any modification to the synaptic weight that reduces the probability that the synaptic state will remain the same upon subsequent measurement.

Hebbian (select the path): Any modification to the synaptic weight that increases the probability that the synaptic state will remain the same upon subsequent measurement. Our use of Hebbian learning follows a standard mathematical generalization of Hebb’s famous postulate: “When an axon of cell A is near enough to excite B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased [24].” Hebbian learning can be represented mathematically as , where and are the activities of the pre- and post-synaptic neurons and is the change to the synaptic weight between them. Anti-Hebbian learning is the negative of Hebbian: . Notice that intrinsic to this mathematical definition is the notion of state. The pre- and post-synaptic activities as well as the weight may be positive or negative. We achieve the notion of state in our physical circuits via differential conductances (Equation 3).

Linear Neuron Model To begin our mapping of AHaH plasticity to computing and machine learning systems we use a standard linear neuron model. The choice of a linear neuron is motivated by the fact that they are ubiquitous in machine learning and also because it is easy to achieve the linear sum function in a physical circuit, since currents naturally sum. The inputs in a linear model are the outputs from other neurons or spike encoders (to be discussed). The weights are the strength of the inputs. The larger , the more affects the neuron’s output. Each input is multiplied by a corresponding weight and these values, combined with the bias , are summed together to form the output : (4) The weights and bias change according to AHaH plasticity, which we further detail in the sections that follow. The AHaH rule acts to maximize the margin between positive and negative classes. In what follows, AHaH nodes refer to linear neurons implementing the AHaH plasticity rule.

AHaH Attractors Extract Independent Components What we desire is a mechanism to extract the underlying building blocks or independent components of a data stream, irrespective of the number of discrete channels those components are communicated over. One method to accomplish this task is independent component analysis. The two broadest mathematical definitions of independence as used in ICA are (1) minimization of mutual information between competing nodes and (2) maximization of non-Gaussianity of the output of a single node. The non-Gaussian family of ICA algorithms uses negentropy and kurtosis as mathematical objective functions from which to derive a plasticity rule. To find a plasticity rule capable of ICA we can minimize a kurtosis objective function over the node output activation. The result is ideally the opposite of a peak: a bimodal distribution. That is, we seek a hyperplane that separates the input data into two classes resulting in two distinct positive and negative distributions. Using a kurtosis objective function, it can be shown that a plasticity rule of the following form emerges [43]: (5)where and are constants that control the relative contribution of Hebbian and anti-Hebbian plasticity, respectively. Equation 5 is one form of many that we call the AHaH rule. The important functional characteristics that Equation 5 shares with all the other forms is that as the magnitude of the post-synaptic activation grows, the weight update transitions from Hebbian to anti-Hebbian learning.

AHaH Attractors Make Optimal Decisions An AHaH node is a hyperplane attempting to bisect its input space so as to make a binary decision. There are many hyperplanes to choose from and the question naturally arises as to which one is best. The generally agreed answer to this question is “the one that maximizes the separation (margin) of the two classes.” The idea of maximizing the margin is central to support vector machines, arguably one of the more successful machine learning algorithms. As demonstrated in [43], [44], as well as the results of this paper, the attractor states of the AHaH rule coincide with the maximum-margin solution.

AHaH Attractors Support Universal Algorithms Given a discrete set of inputs and a discrete set of outputs it is possible to account for all possible transfer functions via a logic function. Logic is usually taught as small two-input gates such as NAND and OR. However, when one looks at a more complicated algorithm such as a machine learning classifier, it is not so clear that it is performing a logic function. As demonstrated in following sections, AHaH attractor states are computationally complete logic functions. For example, when robotic arm actuation or prediction is demonstrated, self-configuring logic functions is also being demonstrated. In what follows we will be adopting a spike encoding. A spike encoding consists of either a spike (1) or no spike ( ). In digital logic, the state ‘0’ is opposite or complimentary to the state ‘1’ and it can be communicated. One cannot communicate a pulse of nothing ( ). For this reason, we refer to a spike as ‘1’ and no spike as a ‘ ’ or floating to avoid this confusion. Furthermore, the output of an AHaH node can be positive or negative and hence possess a state. We can identify these positive and negative output states as logical outputs, for example the standard logical ‘1’ is positive and ‘0’ is negative. Let us analyze the simplest possible AHaH node; one with only two inputs. The three possible input patterns are: (6) Stable synaptic states will occur when the sum over all weight updates is zero. We can plot the AHaH node’s stable decision boundary on the same plot with the data that produced it. This can be seen in Figure 2, where decision boundaries A, B and C are labeled. Although the D state is theoretically achievable, it has been difficult to achieve in circuit simulations, and for this reason we exclude it as an available state. Note that every state has a corresponding anti-state. The AHaH plasticity is a local update rule that is attempting to maximize the margin between opposing positive and negative data distributions. As the positive distribution pushes the decision boundary away (making the weights more positive), the magnitude of the positive updates decreases while the magnitude of the opposing negative updates increases. The net result is that strong attractor states exist when the decision boundary can cleanly separate a data distribution. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 2. Attractor states of a two-input AHaH node. The AHaH rule naturally forms decision boundaries that maximize the margin between data distributions (black blobs). This is easily visualized in two dimensions, but it is equally valid for any number of inputs. Attractor states are represented by decision boundaries A, B, C (green dotted lines) and D (red dashed line). Each state has a corresponding anti-state: . State A is the null state and its occupation is inhibited by the bias. State D has not yet been reliably achieved in circuit simulations. https://doi.org/10.1371/journal.pone.0085175.g002 We refer to the A state as the null state. The null state occurs when an AHaH node assigns the same weight value to each synapse and outputs the same state for every pattern. The null state is mostly useless computationally, and its occupation is inhibited by bias weights. Through strong anti-Hebbian learning, the bias weights force each neuron to split the output space equally. As the neuron locks on to a stable bifurcation, the effect of the bias weights is minimized and the decision margin is maximized via AHaH learning on the input weights. Recall Turing’s idea of a network of NAND gates connected by modifier devices as mentioned in the Historical Background section. The AHaH nodes extract independent component states, the alphabet of the data stream. As illustrated in Figure 3, by providing the sign of the output of AHaH nodes to static NAND gates, a universal reconfigurable logic gate is possible. Configuring the AHaH attractor states, , configures the logic function. We can do even better than this however. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 3. Universal reconfigurable logic. By connecting the output of AHaH nodes (circles) to the input of static NAND gates, one may create a universal reconfigurable logic gate by configuring the AHaH node attractor states ( ). The structure of the data stream on binary encoded channels and support AHaH attractor states (Figure 2). Through configuration of node attractor states the logic function of the circuit can be configured and all logic functions are possible. If inputs are represented as a spike encoding over four channels then AHaH node attractor states can attain all logic functions without the use of NAND gates. https://doi.org/10.1371/journal.pone.0085175.g003 We can achieve all logic functions directly (without NAND gates) if we define a spike logic code, where and , as shown in Table 1. As any algorithm or procedure can be attained from combinations of logic functions, AHaH nodes are building blocks from which any algorithm can be built. This analysis of logic is necessary to prove that AHaH attractor states can support any algorithm, not that AHaH computing is intended to replace modern methods of high speed digital logic. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 1. Spike logic patterns. https://doi.org/10.1371/journal.pone.0085175.t001

AHaH Attractors are Bits Every AHaH attractor consists of a state/anti-state pair that can be configured and therefore appears to represent a bit. In the limit of only one synapse and one input line activation, the state of the AHaH node is the state of the synapse just like a typical bit. As the number of simultaneous inputs grows past one, the AHaH bit becomes a collective over all interacting synapses. For every AHaH attractor state that outputs a ‘1’, for example, there exists an equal and opposite AHaH attractor state that will output a ‘−1’. The state/anti-state property of the AHaH attractors follows mathematically from ICA, since ICA is in general not able to uniquely determine the sign of the source signals. The AHaH bits open up the possibility of configuring populations to achieve computational objectives. We take advantage of AHaH bits in the AHaH clustering and AHaH motor controller examples presented later in this paper. It is important to understand that AHaH attractor states are a reflection of the underlying statistics of the data stream and cannot be fully understood as just the collection of synapses that compose it. Rather, it is both the collection of synapses and also the structure of the information that is being processed that result in an AHaH attractor state. If we equate the data being processed as a sequence of measurements of the AHaH bit’s state, we arrive at an interesting observation: the act of measurement not only effects the state of the AHaH bit, it actually defines it. Without the data structure imposed by the sequence of measurements, the state would simply not exist. This bears some similarity to ideas that emerge from quantum mechanics.

AHaH Memristor Circuit Although we discuss a functional or mathematical representation of the AHaH node, AHaH computing necessarily has its foundation in a physical embodiment or circuit. The AHaH rule is achievable if one provides for competing adaptive dissipating pathways. The modern memristor provides us with just such an adaptive pathway. Two memristors provide us with two competing pathways. While some neuromorphic computing research has focused on exploiting the synapse-like behavior of a single memristor [68], [83] or using two serially connected memristive devices with different polarities [67], we implement synaptic weights via a differential pair of memristors with the same polarities (Figure 4) [45]–[47] acting as competing dissipation pathways. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 4. A differential pair of memristors forms a synapse. A differential pair of memristors is used to form a synaptic weight, allowing for both a sign and magnitude. The bar on the memristor is used to indicate polarity and corresponds to the lower potential end when driving the memristor into a higher conductance state. and form a voltage divider causing the voltage at node y to be some value between and . When driven correctly in the absence of Hebbian feedback a synapse will evolve to a symmetric state where V, alleviating issues arising from device inhomogeneities. https://doi.org/10.1371/journal.pone.0085175.g004 The circuits capable of achieving AHaH plasticity can be broadly categorized by the electrode configuration that forms the differential synapses as well as how the input activation (current) is converted to a feedback voltage that drives unsupervised anti-Hebbian learning [46], [47]. Synaptic currents can be converted to a feedback voltage statically (resistors or memristors), dynamically (capacitors), or actively (operational amplifiers). Each configuration requires unique circuitry to drive the electrodes so as to achieve AHaH plasticity, and multiple driving methods exist. The result is that a very large number of AHaH circuits exist, and it is well beyond the scope of this paper to discuss all configurations. Herein, a ‘2-1’ two-phase circuit configuration is introduced because of its compactness and because it is amenable to mathematical analysis. The functional objective of the AHaH circuit shown in Figure 5 is to produce an analog output on electrode y, given an arbitrary spike input of length with active inputs and inactive (floating) inputs. The circuit consists of one or more memristor pairs (synapses) sharing a common electrode labeled y. Driving voltage sources are indicated with circles and labeled with an S, B or F, referring to spike, bias, or feedback respectively. The individual driving voltage sources for spike inputs of the AHaH circuit are labeled , , . The driving voltage sources for bias inputs are labeled , , . The driving voltage source for supervised and unsupervised learning is labeled F. The subscript values a and b indicate the positive and negative dissipative pathways, respectively. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 5. AHaH 2-1 two-phase circuit diagram. The circuit produces an analog voltage signal on the output at node y given a spike pattern on its inputs labeled , , . The bias inputs , , are equivalent to the spike pattern inputs except that they are always active when the spike pattern inputs are active. F is a voltage source used to implement supervised and unsupervised learning via the AHaH rule. The polarity of the memristors for the bias synapse(s) is inverted relative to the input memristors. The output voltage, , contains both state (positive/negative) and confidence (magnitude) information. https://doi.org/10.1371/journal.pone.0085175.g005 During the read phase, driving voltage sources and are set to and respectively for all active inputs. Inactive S inputs are left floating. The number of bias inputs to drive, , is fixed or a function of and driving voltage sources and are set to and respectively for all bias pairs. The combined conductance of the active inputs and biases produce an output voltage on electrode y. This analog signal contains useful confidence information and can be digitized via the function to either a logical 1 or a 0, if desired. During the write phase, driving voltage source F is set to either (unsupervised) or (supervised), where is an externally applied teaching signal. The polarity of the driving voltage sources and are inverted to and . The polarity switch causes all active memristors to be driven to a less conductive state, counteracting the read phase. If this dynamic counteraction did not take place, the memristors would quickly saturate into their maximally conductive states, rendering the synapses useless. A more intuitive explanation of the above feedback cycle is that “the winning pathway is rewarded by not getting decayed.” Each synapse can be thought of as two competing energy dissipating pathways (positive or negative evaluations) that are building structure (differential conductance). We may apply reinforcing Hebbian feedback by (1) allowing the winning pathway to dissipate more energy or (2) forcing the decay of the losing pathway. If we chose method (1) then we must at some future time ensure that we decay the conductance before device saturation is reached. If we chose method (2) then we achieve both decay and reinforcement at the same time.