The multi-memristive synapse

The concept of the multi-memristive synapse is illustrated schematically in Fig. 1. In such a synapse, the synaptic weight is represented by the combined conductance of N devices. By using multiple devices to represent a synaptic weight, the overall dynamic range and resolution of the synapse are increased. For the realization of synaptic efficacy, an input voltage corresponding to the neuronal activation is applied to all constituent devices. The sum of the individual device currents forms the net synaptic output. For the implementation of synaptic plasticity, only one out of N devices is selected and programmed at a time. This selection is done with a counter-based arbitration scheme where one of the devices is chosen according to the value of a counter (see Supplementary Note 1). This selection counter takes values between 1 and N, and each value corresponds to one device of the synapse. After the weight update, the counter is incremented by a fixed increment rate. Having an increment rate co-prime with the clock length N guarantees that all devices in each synapse will eventually get selected and will receive a comparable number of updates provided there is a sufficiently large number of updates. Moreover, if a single selection clock is used for all synapses of a neural network, N can be chosen to be co-prime with the total number of synapses in the network to avoid updating the same device in a synapse repeatedly.

Fig. 1 The multi-memristive synapse concept. a The net synaptic weight of a multi-memristive synapse is represented by the combined conductance \(\left({{\sum} \;G_n} \right)\) of multiple memristive devices. To realize synaptic efficacy, a read voltage signal, V, is applied to all devices. The resulting current flowing through each device is summed up to generate the synaptic output. b To capture synaptic plasticity, only one of the devices is selected at any instance of synaptic update. The synaptic update is induced by altering the conductance of the selected device as dictated by a learning algorithm. This is achieved by applying a suitable programming pulse to the selected device. c A counter-based arbitration scheme is used to select the devices that get programmed to achieve synaptic plasticity. A global selection counter whose maximum value is equal to the number of devices representing a synapse is used. At any instance of synaptic update, the device pointed to by the selection counter is programmed. Subsequently, the selection counter is incremented by a fixed amount. In addition to the selection counter, independent potentiation and depression counters can serve to control the frequency of the potentiation or depression events Full size image

In addition to the global selection counter, additional independent counters, such as a potentiation counter or a depression counter, could be incorporated to control the frequency of potentiation/depression events (see Fig. 1). The value of the potentiation (depression) counter acts as an enable signal to the potentiation (depression) event; a potentiation (depression) event is enabled if the potentiation (depression) counter value is one, and is disabled otherwise (see Supplementary Note 2). The frequency of the potentiation (depression) events is controlled by the maximum value or length of the potentiation (depression) counter. The counters are incremented after the weight update. By controlling how often devices are programmed for a conductance increase or decrease, asymmetries in the device conductance response can be reduced.

The constituent devices of the multi-memristive synapse can be arranged in either a differential or a non-differential architecture. In the latter each synapse consists of N devices, and one device is selected and potentiated/depressed to achieve synaptic plasticity. In the differential architecture, two sets of devices are present, and the synaptic conductance is calculated as G syn = G + − G − , where G + is the total conductance of the set representing the potentiation of the synapse and G − is the total conductance of the set representing the depression of the synapse. Each set consists of N/2 devices. When the synapse has to be potentiated, one device from the group representing G + is selected and potentiated, and when the synapse has to be depressed, one device from the group representing G − is selected and potentiated.

An important feature of the proposed concept is its crossbar compatibility. In the non-differential architecture, by placing the devices that constitute a single synapse along the bit lines of a crossbar, it is possible to sum up the currents using Kirchhoff’s law and obtain the total synaptic current without the need for any additional circuitry (see Supplementary Note 3). The differential architecture can be implemented with a similar approach, where one bit line contains devices of the group G + and another those of the group G − . The total synaptic current can then be found by subtracting the current of these two bit lines. To alter the synaptic weight, one of the word lines is activated according to the value of the selection counter to program the selected device. The scheme can also be adapted to alter the weights of multiple synapses in parallel within the constraints of the maximum current that could flow through the bit line (see Supplementary Note 3).

Multi-memristive synapses based on PCM devices

In this section, we will demonstrate the concept of multi-memristive synapses using nanoscale PCM devices. A PCM device consists of a layer of phase change material sandwiched between two metal electrodes (Fig. 2(a))40, which can be in a high-conductance crystalline phase or in a low-conductance amorphous phase. In an as-fabricated device, the material is typically in the crystalline phase. When a current pulse of sufficiently high amplitude (referred to as the depression pulse) is applied, a significant portion of the phase change material melts owing to Joule heating. If the pulse is interrupted abruptly, the molten material quenches into the amorphous phase as a result of the glass transition. To increase the conductance of the device, a current pulse (referred to as the potentiation pulse) is applied such that the temperature reached via Joule heating is above the crystallization temperature but below the melting point, resulting in the recrystallization of part of the amorphous region41. The extent of crystallization depends on the amplitude and duration of the potentiation pulse, as well as on the number of such pulses. By progressively crystallizing the amorphous region by applying potentiation pulses, a continuum of conductance levels can be realized.

Fig. 2 Synapses based on phase change memory. a A PCM device consists of a phase-change material layer sandwiched between top and bottom electrodes. The crystalline region can gradually be increased by the application of potentiation pulses. A depression pulse creates an amorphous region that results in an abrupt drop in conductance, irrespective of the original state of the device. b Evolution of mean conductance as a function of the number of pulses for different programming current amplitudes (I prog ). Each curve is obtained by averaging the conductance measurements from 9700 devices. The inset shows a transmission electron micrograph of a characteristic PCM device used in this study. c Mean cumulative conductance change observed upon the application of repeated potentiation and depression pulses. The initial conductance of the devices is ∼5 μS. d The mean and the standard deviation (1σ) of the conductance values as a function of number of pulses for I prog = 100 μA measured for 9700 devices and the corresponding model response for the same number of devices. The distribution of conductance after the 20th potentiation pulse and the corresponding distribution obtained with the model are shown in the inset. e The left panel shows a representative distribution of the conductance change induced by a single pulse applied at the same PCM device 1000 times. The pulse is applied as the 4th potentiation pulse to the device. The same measurement was repeated on 1000 different PCM devices, and the mean (μ) and standard deviation (σ) averaged over the 1000 devices are shown in the inset. The right panel shows a representative distribution of one conductance change induced by a single pulse on 1000 devices. The pulse is applied as the 4th potentiation pulse to the devices. The same measurement was repeated for 1000 conductance changes, and the mean and standard deviation averaged over the 1000 conductance changes are shown in the inset. It can be seen that the inter-device and the intra-device variability are comparable. The negative conductance changes are attributed to drift variability (see Supplementary Note 4) Full size image

First, we present an experimental characterization of single-device PCM-based synapses based on doped Ge 2 Sb 2 Te 5 (GST) and integrated into a prototype chip in 90 nm CMOS technology42 (see Methods). Figure 2(b) shows the evolution of the mean device conductance as a function of the number of potentiation pulses applied. A total of 9700 devices were used for the characterization, and the programming pulse amplitude I prog was varied from 50 to 120 μA. It can be seen that the mean conductance value increases as a function of the number of potentiation pulses. The dynamic range of conductance response is limited as the change in the mean conductance decreases and eventually saturates with increasing number of potentiation pulses. Figure 2(c) shows the mean cumulative change in conductance as a function of the number of pulses for different values of I prog . A well-defined nonlinear monotonic relationship exists between the mean cumulative conductance change and the number of potentiation pulses. In addition, there is a granularity that is determined by how small a conductance change can be induced by applying a single potentiation pulse. Large conductance change granularities, as well as nonlinear conductance responses, both observed in the PCM characterization performed here, have been shown to degrade the performance of neural networks trained with memristive synapses34,43. Moreover, when a conductance decrease is desired, a single high-amplitude depression pulse applied to a PCM device has an all-or-nothing effect that fully depresses the device conductance to (almost) 0 μS. Such a strongly asymmetric conductance response is undesirable in memristive-device-based implementations of neural networks44, and this is a significant challenge for PCM-based synapses. Depression pulses with smaller amplitude could be applied to achieve higher conductance values. However, unlike the potentiation pulses, it is not possible to achieve a progressive depression by applying successive depression pulses.

There are also significant intra-device and inter-device variabilities associated with the conductance response in PCM devices as evidenced by the distribution of conductance values upon application of successive potentiation pulses (see Fig. 2(d)). Note that the variability observed in these devices fabricated in the 90 nm technology node is also found to be higher than that of those fabricated in the 180 nm node as reported elsewhere34. Both the mean and variance associated with the conductance change depend on the mean conductance value of the devices. We capture this behavior in a PCM conductance response model that relies on piece-wise linear approximations to the functions that link the mean and variance of the conductance change to the mean conductance value45. As shown in Fig. 2(d), this model approximates the experimental behavior fairly well.

The intra-device variability in PCM is attributed to the differences in atomic configurations associated with the amorphous phase change material created during the melt-quench process46. Inter-device variability, on the other hand, arises predominantly from the variability associated with the fabrication process across the array and results in significant differences in the maximum conductance and conductance response across devices (see Supplementary Fig. 1). To investigate the intra-device variability, we measured the conductance change on the same PCM device induced by a single potentiation pulse of amplitude I prog = 100 μA over 1000 trials (Fig. 2(e), left panel). To quantify the inter-device variability, we monitored the conductance change induced by a single potentiation pulse across the 1000 devices (Fig. 2(e), right panel). These experiments show that the standard deviation of the conductance change due to intra-device variability is almost as large as that due to the inter-device variability. The finding that the randomness in the conductance change is to a large extent intrinsic to the physical characteristic of the device implies that improvements in the array-level variability will not necessarily be effective in reducing the randomness.

The characterization work presented so far highlights the challenges associated with synaptic realizations using PCM devices and these can be generalized to other memristive technologies. The limited dynamic range, the asymmetric and nonlinear conductance response, the granularity and the randomness associated with conductance changes all pose challenges for realizing neural networks using memristive synapses. We now show how our concept of multi-memristive synapses can help in addressing some of those challenges. Experimental characterizations of multi-memristive synapses comprising 1, 3, and 7 PCM devices per synapse arranged in a non-differential architecture are shown in Fig. 3(a). The conductance change is averaged over 1000 synapses. One selection counter with an increment rate of one arbitrates the device selection. As the total conductance is the sum of the individual conductance values, the dynamic range scales linearly with the number of devices per synapse. Alternatively, for a learning algorithm requiring a fixed dynamic range, multi-memristive synapses can improve the effective conductance change granularity. In addition, in contrast to a single device, the mean cumulative conductance change here is linear over an extended range of potentiation pulses. With multiple devices, we can also partially mitigate the challenge of an asymmetric conductance response. At any instance, only one device is depressed, which implies that the effective synaptic conductance decreases gradually in several steps instead of the abrupt decrease observed in a single device. Moreover, using the depression counter, the cumulative conductance changes for potentiation and depression can be made approximately symmetric by adjusting the frequency of depression events. Finally, Fig. 3(b) shows that both the mean and the variance of the conductance change scale linearly with the number of devices per synapse. Hence, the smallest achievable mean weight change decreases by a factor of N, whereas the standard deviation of the weight change decreases by \(\sqrt N\), leading to an overall increase in weight update resolution by \(\sqrt N\) (see Supplementary Fig. 2).

Fig. 3 Multi-memristive synapses based on phase change memory. a The mean cumulative conductance change is experimentally obtained for synapses comprising 1, 3, and 7 PCM devices. The measurements are based on 1000 synapses, whereby each individual device is initialized to a conductance of ∼5 μS. For potentiation, a programming pulse of I prog = 100 μA was used, whereas for depression, a programming pulse of I prog = 450 μA was used. For depression, the conductance response can be made more symmetric by adjusting the length of the depression counter. b Distribution of the cumulative conductance change after the application of 10, 30, and 70 potentiation pulses to 1, 3, and 7 PCM synapses, respectively. The mean (μ) and the variance (σ2) scale almost linearly with the number of devices per synapse, leading to an improved weight update resolution Full size image

Simulation results on handwritten digit classification

In this section, we study the impact of PCM-based multi-memristive synapses in the context of training ANNs and SNNs. For synaptic potentiation, the PCM conductance response model presented above was used (see Fig. 2(d)). The depression pulses are assumed to cause an abrupt conductance drop to zero in a deterministic manner, modeling the PCM asymmetry. One selection counter is used for all synapses of the network, and the weight updates are done sequentially through all synapses in the same order at every pass. Potentiation and depression counters are used to balance the frequency of potentiation and depression events for N > 1.

First, we present simulation results that show the performance of an ANN trained with multi-memristive synapses based on the nonlinear conductance response model of the PCM devices. The feedforward fully-connected network with three neuron layers is trained with the backpropagation algorithm to perform a classification task on the MNIST data set of handwritten digits47 (see Fig. 4(a) and Methods). The ideal classification performance of this network, assuming double-precision floating-point accuracy for the weights, is 97.8%. The synaptic weights are represented using the conductance values of a multi-memristive synapse model. In the non-differential architecture, a depression counter is used to improve the asymmetric conductance response and a potentiation counter to reduce the frequency of the potentiation events. As shown in Fig. 4(a), the classification accuracy improves with the number of devices per synapse. With the conventional differential architecture with two devices, the classification accuracy is below 15%. With multi-memristive synapses in the differential architecture, we can achieve test accuracies exceeding 88.9%, a performance better than the state-of-the-art in situ learning experiments on PCM despite a significantly more nonlinear and stochastic conductance response due to technology scaling34. Remarkably, accuracies exceeding 90% are possible even with the non-differential architecture, which clearly illustrates the efficacy of the proposed scheme.

Fig. 4 Applications of multi-memristive synapses in neural networks. a An artificial neural network is trained using backpropagation to perform handwritten digit classification. Bias neurons are used for the input and hidden neuron layers (white). A multi-memristive synapse model based on the nonlinear conductance response of PCM devices is used to represent the synaptic weights in these simulations. Increasing the number of devices in multi-memristive synapses (both in the differential and the non-differential architecture) improves the test accuracy. Simulations are repeated for five different weight initializations. The error bars represent the standard deviation (1σ). The dotted line shows the test accuracy obtained from a double-precision floating-point software implementation. b A spiking neural network is trained using an STDP-based learning rule for handwritten digit classification. Here again, a multi-memristive synapse model is used to represent the synaptic weights in simulations where the devices are arranged in the differential or the non-differential architecture. The classification accuracy of the network increases with the number of devices per synapse. Simulations are repeated for five different weight initializations. The error bars represent the standard deviation (1σ). The dotted line shows the test accuracy obtained from a double-precision floating-point implementation Full size image

In a second investigation, we studied an SNN with multi-memristive synapses to perform the same task of digit recognition, but with unsupervised learning48 (see Fig. 4(b) and Methods). The weight updates are performed using an STDP rule: the synapse is potentiated whenever a presynaptic neuronal spike appears prior to a postsynaptic neuronal spike, and depressed otherwise. The amount of weight increase (decrease) within the potentiation (depression) window is constant and independent of the timing difference between the spikes. This necessitates a certain weight update granularity, which can be achieved by the proposed approach. The classification performance of the network trained with this rule using double-precision floating-point accuracy for the network parameters is 77.2%. A potentiation counter is used to reduce the frequency of the potentiation events in both the differential and non-differential architectures, and a depression counter is used in the non-differential architecture to improve the asymmetric conductance response. The network could classify more than 70% of the digits correctly for N > 9 with both the differential and the non-differential architecture, whereas the network with the conventional differential architecture with two devices has a classification accuracy below 21%.

In both cases, we see that the multi-memristive synapse significantly outperforms the conventional differential architecture with two devices, clearly illustrating the effectiveness of the proposed architecture. Moreover, the fact that the non-differential architecture achieves a comparable performance to that of the differential architecture is promising for synaptic realizations using highly asymmetric devices. A non-differential architecture would have a lower implementation complexity than its differential counterpart because the refresh operation34,37, which requires reading and reprogramming G + and G − , can be completely avoided.

Experimental results on temporal correlation detection

Next, we present an experimental demonstration of the multi-memristive synapse architecture using our prototype PCM chip (see Methods) to train an SNN that detects temporal correlations in event-based data streams in an unsupervised way. Unsupervised learning is widely perceived as a key computational task in neuromorphic processing of big data. It becomes increasingly important given today’s variety of big data sources, for which often neither labeled samples nor reliable training sets are available. The key task of unsupervised learning is to reveal the statistical features of big data, and thereby shed light on its internal correlations. In this respect, detecting temporal and spatial correlations in the data is essential.

The SNN comprises a neuron interfaced to plastic synapses, with each one receiving an event-based data stream as presynaptic input spikes49,50 (see Fig. 5(a) and Methods). A subset of the data streams are mutually temporally correlated, whereas the rest are uncorrelated (see Supplementary Note 5). When the input streams are applied, postsynaptic outputs are generated at the synapses that received a spike. The resulting postsynaptic outputs are accumulated at the neuron. When the neuronal membrane potential exceeds a threshold, the output neuron fires, generating a spike. The synaptic weights are updated using an STDP rule; synapses that receive an input spike within a time window before (after) the neuronal spike get potentiated (depressed). As it is more likely that the temporally correlated inputs will eventually govern the neuronal firing events, the conductance of synapses receiving correlated inputs is expected to increase, whereas that of synapses whose input are uncorrelated is expected to decrease. Hence, the final steady-state distribution of the weights should display a separation between synapses receiving correlated and uncorrelated inputs.

Fig. 5 Experimental demonstration of multi-memristive synapses used in a spiking neural network. a A spiking neural network is trained to perform the task of temporal correlation detection through unsupervised learning. Our network consists of 1000 multi-PCM synapses (in hardware) connected to one integrate-and-fire (I&F) software neuron. The synapses receive event-based data streams generated with Poisson distributions as presynaptic input spikes. 100 of the synapses receive correlated data streams with a correlation coefficient of 0.75, whereas the rest of the synapses receive uncorrelated data streams. The correlated and the uncorrelated data streams both have the same rate. The resulting postsynaptic outputs are accumulated at the neuronal membrane. The neuron fires, i.e., sends an output spike, if the membrane potential exceeds a threshold. The weight update amount is calculated using an exponential STDP rule based on the timing of the input spikes and the neuronal spikes. A potentiation (depression) pulse with fixed amplitude is applied if the desired weight change is higher (lower) than a threshold. b The synaptic weights are shown for synapses comprising N = 1, 3, and 7 PCM devices at the end of the experiment (5000 time steps). It can be seen that the weights of the synapses receiving correlated inputs tend to be larger than the weights of those receiving uncorrelated inputs. The weight distribution shows a clearer separation with increasing N. c Weight evolution of six synapses in the first 300 time steps of the experiment. The weight evolves more gradually with the number of devices per synapse. d Synaptic weight distribution of an SNN comprising 144,000 multi-PCM synapses with N = 7 PCM devices at the end of an experiment (3000 time steps) (upper panel). 14,400 synapses receive correlated input data streams with a correlation coefficient of 0.75. A total of 1,008,000 PCM devices are used for this large-scale experiment. The lower panel shows the synaptic weight distribution predicted by the PCM device model Full size image

First, we perform small-scale experiments in which multi-memristive synapses with PCM devices are used to store the synaptic weights. The network comprises 1000 synapses, of which only 100 receive temporally correlated inputs with a correlation coefficient c of 0.75. The difficulty in detecting whether an input is correlated or not increases both with decreasing c and decreasing number of correlated inputs. Hence, detecting only 10% correlated inputs with c < 1 is a fairly difficult task and requires precise synaptic weight changes for the network to be trained effectively51. Each synapse comprises N PCM devices organized in a non-differential architecture. During the weight update of a synapse, a single potentiation pulse or a single depression pulse is applied to one of the devices the selection counter points to. A depression counter with a maximum value of 2 is incorporated for N > 1 to balance the PCM asymmetry. Figure 5(b) depicts the synaptic weights at the end of the experiment for different values of N. To quantify the separation of the weights receiving correlated and uncorrelated inputs, we set a threshold weight that leads to the lowest number of misclassifications. The number of misclassified inputs were 49, 8, and 0 for N = 1, 3, and 7, respectively. This demonstrates that the network’s ability to detect temporal correlations increases with the number of devices. This holds true even for lower values of the correlation coefficient as shown in Supplementary Note 6. With N = 1, there are strong abrupt fluctuations in the evolution of the conductance values because of the abrupt depression events as shown in Fig. 5(c). With N = 7, a more gradual potentiation and depression behavior is observed. For N = 7, the synapses receiving correlated and uncorrelated inputs can be perfectly separated at the end of the experiments. In contrast, the weights of correlated inputs display a wider weight distribution and there are numerous misclassified weights for N = 1.

The multi-memristive synapse architecture is also scalable to larger network sizes. To demonstrate this, we repeated the above correlation experiment with 144,000 input streams, and with seven PCM devices per synapse, resulting in more than one million PCM devices in the network. As shown in Fig. 5(d), well-separated synaptic distributions have been achieved in the network at the end of the experiment. Moreover, a simulation was performed with the nonlinear PCM device model (see Methods). The simulation captures the separation of weights receiving correlated and uncorrelated inputs. In both experiment and simulation, ∼0.1% of the inputs were misclassified after training.