Weight-dependent STDP in TiO 2 -based memristors

STDP is one of the most widely studied plasticity rules for spiking neural networks. In its pure form it relies on the premise that the relative timing between pre- and post-synaptic spike events is the major determinant of both the direction (potentiation/depression) and the magnitude of synaptic weight changes. Recently the hardware-friendly, pulse-based biasing scheme shown in Fig. 1a–c has been proposed as a possible method for implementing STDP in memristor-based synapses17,18,35. The memristor’s resistive state (conductance) is interpreted as the equivalent of a synaptic efficacy (weight). To implement plasticity events, the scheme exploits the inherent capability of some memristive devices to act as thresholded voltage time-integrators, that is to change their resistive state as a function of input voltage, so long as its magnitude exceeds a certain threshold (the switching threshold). When the pre-synaptic neuron spikes, a prolonged low-voltage pulse is applied across the memristor. This pulse is by itself unable to induce any resistive switching (Fig. 1a). Spiking of the post-synaptic neuron, on the other hand, leads to the application of a brief, biphasic, bipolar pulse (Fig. 1b) that causes the memristor to undergo long-term depression (LTD). Concurrent pre- and post-synaptic terminal spiking causes the memristor to sense the superposition of the pre- and post-synaptic spike waveforms and thereby undergo long-term potentiation (LTP; Fig. 1c).

Figure 1: Weight-dependent STDP in memristors. (a–c) Memristor electrical biasing scheme used to test STDP. V th+ , V th− : memristor switching thresholds. Data for individual device thresholds in Supplementary Table 2. Voltage levels used to induce LTP and LTD in Supplementary Table 3. Red shading: supra-threshold portions of the input affecting the memristor resistive state. (d) Typical experimental results from TiO 2 device. Black trace: raw data; blue trace: 10-point moving average; red trace: exponential fitting. Red shading: LTP. Blue shading: LTD. No shading: neutral region, no plasticity triggered. (e) Experimental data and exponential fittings describing STDP magnitude (relative change in device conductance ) as a function of initial memristor conductance. Red line: LTP fitting. Blue line: LTD fitting. Black dashed line: zero conductance change level. Same data as in d. Full size image

We fabricated TiO 2 -based devices (see Methods) and studied their behaviour during exposure to trains of STDP events. Each device under test (DUT) was exposed to four blocks of events, each consisting of 2,400 individual events: LTD-inducing post-only events; LTP-inducing combined pre- and post-events; LTD events again; and finally, plasticity-neutral pre-events only. Figure 1d shows typical measured results from our prototype DUTs for all mentioned electrical biasing schemes. First, we observe that the STDP rules are followed throughout the entire test, including the plasticity-neutrality of pre-only events (confirmed by experiments where pre-only events were applied at the high-conductance boundary of the DUT’s operating range—Supplementary Fig. 2). Next, we observe the marked dependence of changes in resistive state on the running resistive state (DUT conductance g) for both LTP and LTD (Fig. 1e). Such dependence of conductance changes on the actual memristive state has commonly been observed in memristors, including both metal-oxide25 and phase-change36 implementations. In supervised learning rules, such as the perceptron rule, this property is undesirable as updates independent of memristive state are required32. Here we particularly leverage this property to enable for the first-time unsupervised learning in a practical network, in a manner similar to the work presented previously in ref. 37 that is based on simulations of PCM models.

The experimental results in Fig. 1d,e suggest that the STDP rule being implemented can be described for each plasticity event by

where PRE and POST are binary values indicating whether a pre- or post-spike has occurred in the given event, respectively, whilst f+(g) and f−(g) are functions that capture the influence of DUT conductance on LTP and LTD strength (also see Supplementary Note 2 and Supplementary Fig. 3). Normalizing to obtain relative changes in g and rearranging we get

where and both fitted by exponentials in Fig. 1e.

Plotting Δg/g versus g for both LTP and LTD reveals that our solid-state synapse features inherently self-stabilizing plasticity (Fig. 1e): at higher conductance levels, further increases in conductance (LTP) become progressively smaller. Similarly, at the bottom end of the conductance scale LTD induction becomes increasingly ineffective. The gradual and monotonic dependence of weight changes on the running value of weight is an essential feature for memory models of unsupervised learning. If a stochastic data stream that triggers LTP and LTD with probabilities p and (1−p), respectively, is fed into the DUT, we can expect its conductance to converge towards a unique equilibrium point. In other words, the memristive synapse should be able to encode and store in its resistive state the conditional probability p(PRE|POST=1) that a given postsynaptic spike is preceded by a presynaptic spike at the synapse within a short time interval. For instance, consider a memrisitve synapse that is exposed to STDP events that consist of a mixture of 90% LTP events and 10% LTD events. We can expect the DUT conductance to eventually stabilise close to the upper boundary of the DUT’s resistive state operating range.

Memristor synapses can encode conditional probabilities

We experimentally tested the theoretical prediction that conditional probabilities can be encoded and stored in the resistive state of a memristor. We performed four measurement runs on the same test device. Each run consisted of 10 blocks of plasticity events (104 events per block, that is, 105 events per run, blue dots in Fig. 2). Individual plasticity events were randomly chosen to be LTP events with probability p LTP and LTD events with probability 1−p LTP , where the probability of an LTP event was fixed within each block. In runs 2 and 4, p LTP was 95%, 85%, ..., 5% for blocks 1–10, respectively, that is, the probability of LTP events was decreased after each event block. In runs 1 and 3, the same LTP probabilities were tested, but in random order (Supplementary Table 4 and Supplementary Note 3). At the end of each block the final resistive state of the memristor was measured (average of 25 read-outs after the end of each block).

Figure 2: TiO 2 -based memristors encode conditional probabilities. (a) Final memristor conductance after application of 104 input event blocks featuring different LTP/LTD compositions. Blue line corresponds to linear fit for runs 2–4. Error bars: s.d., number of samples (individual resistive state readings) per data point n=25. Typical traces showing resistive state migration during two typical blocks: (b) one where the device is overall depressed (third block in run 2: Block 2.3) and (c) where it is potentiated (first block in run 2: Block 2.1). Full size image

The results of the experiment are shown in Fig. 2. After a burn-in phase, during which the memristor gradually reaches its normal operating range observed during the first run (105 events) we obtained consistent convergence points for the remaining three runs ( events) and a clear mapping between LTP/LTD composition and convergence conductance emerges: converged conductance data from runs 2–4 (that is excluding burn-in) is first pooled (convergence points at each LTP/LTD composition are averaged) and then fitted to a linear function (equation and fitting parameters in Supplementary Note 4) of converged conductance versus LTP/LTD composition by least squares regression. The root mean squared error of this fitting is approximately . Moreover, we notice that the runs where the order of the LTP/LTD composition points was scrambled (1 and 3) show less well-behaved convergence points. Attempting to extrapolate memristor behaviour by exponential fitting, as presented in Supplementary Fig. 4 indicates that even 104 events seem insufficient to achieve convergence given the choice of biasing parameters (Supplementary Note 5). We believe that this could be potentially addressed as more realistic memristor models appear. Thus, we can conclude that TiO 2 memristor-based synapses appear to be able to practically support the encoding of conditional probabilities p(PRE|POST=1) in their resistive states.

Probabilistic neural networks with memristor synapses

The ability of individual memristors to encode conditional probabilities can be leveraged for the implementation of self-adapting spiking neural networks. In particular, WTA networks38 have repeatedly been proposed for hardware implementations39,40,41,42, motivated in part by the fact that WTA structures play an important role in cortical information processing43. Recent rigorous analyses revealed that WTA networks consisting of stochastic spiking neurons subject to weight-dependent STDP are capable of performing probabilistic inference that essentially carries out clustering of input patterns. While a number of different types of WTA networks have been considered35,44,45,46,47, optimal parameter adaptation is in any case accomplished by weight-dependent STDP rules of the form , that is, by rules similar to the memristor-implemented plasticity rule from equation (1).

To test whether memristor-based synapses can perform adequately as components of WTA networks, we implemented a WTA network that consisted of two stochastic spiking neurons with four inputs each. All four input synapses to one WTA neuron were implemented by TiO 2 -based devices, while the synapses to the other neuron were implemented in software (Fig. 3a). This hybrid network allowed us to directly compare software-simulated synaptic connections with memristive synapses in the same set-up and with exactly the same inputs. It also allowed us to directly manipulate the software synapses and study the influence on memristive plasticity.

Figure 3: Learning in a WTA network with a mixture of software and memristor synapses. (a) Diagram of the 2-neuron, WTA network used in this work. (b) Evolution of neuron specializations S i to patterns 0110 and 1001 as weights change over successive events, illustrating the interplay between the two neurons. Inset: close-up of first 60 trials. (c) Computed membrane potentials of each neuron to both prototype patterns according to their weights at every trial illustrating the intrinsic pattern preferences of each neuron, that is independent of their interaction in the WTA network. (d) Evolution of hardware (synapses 0–3, enclosed in thick, black frame) and software (synapses 4–7) weights. (e,f) Responses of the WTA network to the initial (e) and final (f) 41 input samples. The fire count of both the hardware synapse neuron (orange) and the software synapse neuron (turquoise) is shown for patterns 0110 and 1001, and patterns that differ from these prototypes in one position (0110 δ and 1001 δ ). The different pattern groups are perfectly segregated by the end of the run. Full size image

The 2-neuron probabilistic WTA network was implemented on an in-house developed instrumentation board for memristor device characterization48. The two artificial neurons, WTA lateral inhibition and synapses feeding one of the neurons were all implemented in software on the board’s microcontroller unit. During each experiment run 1,200 four-bit patterns were presented to the network at the inputs . Determining the values of y begins by randomly and equiprobably drawing a pattern to be presented from a set of prototype test patterns (in our case 0110 and 1001). Next, each bit in the selected pattern is flipped with a probability of 10% so that the network is presented with noisy instantiations of the prototype patterns. The resulting generated input vector is then multiplied by the weight vectors of both neurons and translated into membrane potential values, one for each neuron, as per equation (3):

where U i (y,t) denotes the membrane potential for neuron i during event t, θ i (t) an adaptive excitability term that homoeostatically regulates neuron activity and w i the weight vector from inputs y to neuron i. The symbol ˙ represents the dot product operator. Importantly, whilst U i represents the membrane potential of neuron i for the purposes of driving its firing behaviour, it does not directly translate to a physical voltage value to be applied to all synapse terminals (pre or post) it is connected to. Neuron firing events are instead translated into appropriate pre- or post-type voltage waveforms that are used to bias the affected memristor synapses. The homoeostatic term θ i (t) has been used before for memristor learning17 and has been theoretically justified in ref. 44 for unsupervised learning in probabilistic WTA networks. By reducing the propensity to fire for neurons that show high average response, homoeostasis ensures that both neurons participate in the WTA competition over the long run (details in Methods section).

The probability p i (y,t) with which neuron i wins the WTA competition and therefore spikes at event t is given by

Using computed p i values for each pattern at each time step we can define a specialization metric S that directly quantifies how attuned each neuron is to the two prototype input patterns:

where S i (t) is the specialization of neuron i at time t and takes values between 1 (perfectly specialized on 1001) and −1 (perfect specialization on 0110).

By definition, at every event exactly one of the neurons wins and fires, thus triggering plasticity at its synapses. In the case of software synapses, weights are updated through a simple STDP rule that aims to approximately mirror memristor plasticity. The variability in resulting STDP-driven weight changes Δw and measurement noise observed in memristor synapses have both been included in the software synapse plasticity mechanism (see Methods). In the case of the hardware synapses the STDP conditions that determine whether LTP or LTD is required are the same as for their software counterparts, but the LTP and LTD events are translated into pulse voltage stimulation and therefore the magnitude of weight change is inherently set by each memristor. For the purposes of this experiment and since the non-invasiveness of the pre-only event has already been confirmed (Fig. 1), the pulsing scheme for LTP and LTD is reduced to only the above-threshold portions of the original waveforms, that is, both LTP and LTD are represented by simple square-waves of appropriate amplitude. To map device resistive states onto weights all memristive synapses were first subjected to the protocol described in Fig. 1. Estimated maximum and minimum operational conductance values (extracted from the constant term of exponential fittings to traces in Fig. 1d—also see Supplementary Fig. 5) were mapped linearly to a weight range of [−2.2, +2.2]. The conductance-weight mappings are summarized in Supplementary Table 3.

Results from a WTA network experiment (run no. 1) are shown in Fig. 3. Both hardware and software synaptic weights w ij were initialized close to 0 (see Methods section) and subsequently the network was allowed to react to the incoming patterns freely. According to theoretical WTA models, unsupervised synaptic adaptations through STDP should lead to a clustering of inputs such that each neuron is preferentially activated by one of the prototype patterns and noisy variations of it. Figure 3 demonstrates this behaviour in our set-up with memristive synapses. The specialization evolution in Fig. 3b shows how after a brief initial phase of uncertainty where the neurons are approximately equally attuned to both patterns and none can claim dominance over either pattern (approximately first 20–30 samples), the hardware synapse neuron develops a clear preference for pattern 0110 (specialization S approaches −1). Similarly, we can use the weights of software and hardware synapses at each trial to plot computed membrane potentials for each neuron in response to each pattern. This is shown in Fig. 3c where we observe how at the beginning of the run neither neuron has any intrinsic preference for any pattern (that is independent of the neuron–neuron interaction through the WTA); this only starts developing afterwards. The robustness of these experiments was confirmed by repeating the experiment three times in total. Results from all three runs are summarized in Supplementary Fig. 6 and Supplementary Note 6.

Examining the evolution of weight values throughout the run (Fig. 3d) we observe that the hardware synapse weights experience noisy and slow drift from their initial values. To quantify this the evolution of each weight over trials was fitted to an exponential function and the s.d. of the residual was then computed. This yielded estimates of both the noise levels and the overall weight change for each synapse over the trial (for full results see Supplementary Note 7 and Supplementary Fig. 7). The software synapses concurrently experience similarly imperfect drift towards their final state. For comparison, see Supplementary Figs 8 and 9 in the case where software synapses are noise-free. These results are confirmed by Fig. 3e,f where we see a substantially clearer classification of pattern 0110 and related patterns different from 0110 in only one position (0110 δ ) on the one hand (purple shading) and 1001 with 1001 δ (patterns different from 1001 in only one position) on the other hand (green shading) towards the end of the experiment versus the beginning. Specifically, at the beginning of the run patterns 1001 and 1001 δ cause the neuron that ultimately assigns itself to them (software synapse) to fire only approximately 56% of the time whilst similarly the hardware synapse neuron responds to its corresponding patterns (0110 and 0110 δ ) approximately 77% of the time. In contrast, at the end of the run classification accuracy increases to 100% for both neurons. Thus, the WTA network successfully segregates the prototype patterns despite the presence of noise. This result was achieved in a fully unsupervised manner. An example case of how the same test evolves when software synapse imperfections are suppressed is shown in Supplementary Note 8 and Supplementary Figs 8 and 9.

Finally, to demonstrate that the WTA network is capable of not only learning a pattern but also if demanded forgetting and relearning it, a further set of experiments was conducted. This consisted of two further, consecutive WTA learning runs (runs no. 2 and 3) immediately following the main run from Fig. 3 (by the end of which we recall the memristor synapses had specialized their neuron to pattern 0110). At the beginning of each of these additional runs the software synapses were initialized such that the network specialization acquired during the immediately preceding learning run was reversed (hardware synapses were left unchanged). Under these circumstances the memristor-based synapses are expected to respond by flipping their intrinsic preference to the opposite pattern. Results are shown in Fig. 4.

Figure 4: Reversible learning is supported in WTA networks using TiO 2 memristor-based synapses. (a–e) First run attempting to unteach the pattern recognition abilities gained in Fig. 3. (a) Evolution of neuron specializations S i to patterns 0110 and 1001 as weights change over successive events, illustrating the interplay between the two neurons. (b) Computed membrane potentials of each neuron to both prototype patterns according to their weights at every trial illustrating the intrinsic pattern preferences of each neuron, that is independent of their interaction in the WTA network. (c) Evolution of hardware (synapses 0–3, enclosed in thick, black frame) and software (synapses 4–7) weights. (d,e) Responses of the WTA network to the initial (d) and final (e) 41 input samples. The fire count of both the hardware synapse neuron (orange) and the software synapse neuron (turquoise) is shown for patterns 0110, 1001 and patterns that differ from these prototypes in one position (0110 δ and 1001 δ ). (f–j) Corresponding data as in a–e for second run attempting to reteach the memristor synapses to prefer pattern 0110. The abrupt changes between final and initial responses over consecutive experiments mainly arise from the different initializations of the software synapses in each case. Full size image

In the case of the first additional run, the software synapses were initialized in such way as to instantly reverse the preferred pattern-to-neuron mapping outcome of the previous learning session and start the learning run with the software synapse, rather than the memristor synapse neuron more responsive to pattern 0110. Such initialization should induce the memristor synapses to attempt specializing on pattern 1001 instead. The top half of Fig. 4 shows that this is indeed the case: at the end of the run the hardware synapse neuron has lost its intrinsic preference to pattern 0110 and began switching to 1001 as evidenced by the membrane potential plot (Fig. 4b), which allowed the software neuron to consolidate its dominance of 0110 (Fig. 4a). Simultaneously, the software synapse weights remain relatively static around their extreme values, as initialized. The second additional run similarly initializes the software synapses appropriately to guide the memristor synapses to re-specialize on pattern 0110. This successfully occurs as evidenced by Fig. 4f–j and confirmed by additional runs shown in Supplementary Fig. 10 and Supplementary Note 9. In both cases, the fire count histograms (Fig. 4d,e,i,j) show how the initial classification preferences of each neuron become entrenched during each run as a result of the combined changes in both software and hardware synapse weights with hardware synapses mainly driving the process (Fig. 4b,g and Supplementary Table 5).