To test whether equation 2.5 could be used to train a single neuron to emit a predefined target spike pattern, we simulated a single LIF neuron that received a set of 100 spike trains as inputs. The target spike train was chosen as five equidistant spikes over the interval of 500 ms. The inputs were drawn as Poisson spike trains that repeated every 500 ms. We initialized the weights in a regime where the output neuron only showed subthreshold dynamics but did not spike (see Figure 2a). Previous methods, starting from this quiescent state, would require the introduction of noise to generate spiking, which would in turn retard the speed with which precise output spike times could be learned. Finally, weight updates were computed by evaluating the integral in equation 2.5 over a fixed interval and scaling the resulting value with the learning rate (see section 3). After 500 trials, corresponding to 250 s of simulated time, the output neuron had learned to produce the desired output spike train (see Figure 2b). However, fewer trials could generate good approximations to the target spike train (see Figure 2c).

4.1 Learning in Multilayer Spiking Neural Networks

Having established that our rule can efficiently transform complex spatiotemporal input spike patterns to precisely timed output spike trains in a network without hidden units, we next investigated how well the same rule would perform in multilayer networks. The form of equation 2.5 suggests a straightforward extension to hidden layers in analogy to backprop. Namely, we can use the same learning rule; equation 2.5, for hidden units, with the modification that becomes a complicated function that depends on the weights and future activity of all downstream neurons. However, this nonlocality in space and time presents serious problems in terms of both biological plausibility and technical feasibility. Technically, this computation requires either backpropagation through time through the PSP kernel or the computation of all relevant quantities online as, for instance, in the case of RTRL. Here, we explore an approach akin to the latter since our specific choice of temporal kernels allows us to compute all relevant dynamic quantities and error signals online (see Figure 3b). In our approach, error signals are distributed directly through a feedback matrix to the hidden-layer units (see Figure 3a). Specifically, this means that the output error signals are propagated neither through the actual or the “soft” spiking nonlinearity. This idea is closely related to the notion of straight-through estimators in machine learning (Hinton, 2012; Bengio et al., 2013; Baldi et al., 2016). We investigated different configurations of the feedback matrix, which can be (1) symmetric (i.e., the transpose of the feedforward weights), as in the case of backprop; (2) random, as motivated by the recent results on feedback alignment (Lillicrap et al., 2016); or (3) uniform, corresponding closest to a single global third factor distributed to all neurons, akin to a diffuse neuromodulatory signal.

We first sought to replicate the task shown in Figure 2, but with the addition of a hidden layer composed of four LIF neurons. Initially, we tested learning with random feedback. To that end, feedback weights were drawn from a zero mean unit variance gaussian, and their value remained fixed during the entire simulation. The synaptic feedforward weights were also initialized randomly at a level at which neither the hidden units nor the output unit fired a single spike in response to the same input spike trains as used before (see Figure 4a). After training the network for 40 s, some of the hidden units had started to fire spikes in response to the input. Similarly, the output neuron had started to fire at intermittent intervals closely resembling the target spike train (not shown). Continued training on the same task for a total of 250 s led to a further refinement of the output spike train and more differentiated firing patterns in a subset of the hidden units (see Figure 4b).

Although we did not restrict synaptic connectivity to obey Dale's principle, in the example with random feedback, all hidden neurons with positive feedback connections ended up being excitatory, whereas neurons with negative feedback weights generally turned out to be inhibitory at the end of training. These dynamics are a direct manifestation of the feedback alignment aspect of random feedback learning (Lillicrap et al., 2016). Because the example shown in Figure 4 does not strictly require inhibitory neurons in the hidden layer, in many cases the neurons with negative feedback remained quiescent or at low activity levels at the end of learning.

Learning was successful for different initial conditions, although the time for convergence to zero cost varied (see Figure 4d). We did encounter a few cases in which the network completely failed to solve the task. These were the cases in which all feedback connections happened to be initialized with a negative value (see Figure 4c). This eventuality could be made very unlikely, however, by increasing in the number of hidden units (see Figure 4c). Other than that, we did not find any striking differences in performance when we replaced the random feedback connections by symmetric (see Figure 4d) or uniform “all one” feedback weights (see Figure 4e).

The previous task was simple enough such that solving it did not require a hidden layer. We therefore investigated whether SuperSpike could also learn to solve tasks that cannot be solved by a network without hidden units. To that end, we constructed a spiking exclusive-or task in which four different spiking input patterns had to be separated into two classes. In this example, we used 100 input units, although the effective dimension of the problem was two by construction. Specifically, we picked three nonoverlapping sets of input neurons with associated fixed random firing times in a 10 ms window. One set was part of all patterns and served as a time reference. The other two sets were combined to yield the four input patterns of the problem. Moreover, we added a second readout neuron each corresponding to one of the respective target classes (see Figure 5a). The input patterns were given in random order as short bouts of spiking activity at random intertrial intervals during which input neurons were firing stochastically at 4 Hz (see Figure 5b). Because of the added noise, we relaxed the requirement for precise temporal spiking and instead required output neurons to spike within a narrow window of opportunity, which was aligned with and outlasted each stimulus by 15 ms. The output error signal was zero unless the correct output neuron failed to fire within the window. In this case, an error signal corresponding to the correct output was elicited at the end of the window. At any time, an incorrect spike triggered immediate negative feedback. We trained the network comparing different types of feedback. A network with random feedback quickly learned to solve this task with perfect accuracy (see Figures 5b and 5c), whereas a network without hidden units was unable to solve the task (see Figure 5d). Perhaps not surprising, networks with symmetric feedback connections also learned the task quickly, and overall their learning curves were more stereotyped and less noisy (see Figure 5e), whereas networks with uniform feedback performed worse on average (see Figure 5f). Overall, these results illustrate that temporally coding spiking multilayer networks can be trained to solve tasks that cannot be solved by networks without hidden layers. Moreover, these results show that random feedback is beneficial over uniform feedback in some cases.