A spike-timing dependent plasticity (STDP) network is a neural network that uses unsupervised learning to build a predictive model of the world. The learning algorithm that adjusts the weights in an STDP network is an online iterative algorithm that functions as a sort of expectation-maximization process. For each neuron in a network, the elapsed time between the current point and the neuron’s most recent activation is used to determine how the weights are updated at each time-step.

By default each weight is slowly pushed toward 0 via decay. When two connected nodes fire within a short time of one another, the weight between them experiences an abrupt adjustment. The exact gradient applied to the weight is determined by the order in which the neurons fired. When the presynaptic neuron fires first then the weight increases, whereas when the postsynaptic neuron fires first the weight decreases. Weights range between -4 and 4 and are initialized randomly when a network is created.

STDP networks capture the temporal dynamics of the streaming data they are fed, due to the innate importance placed on order-of-occurrence of events with respect to the way that the network is updated. In a way, each weight is a representation of the likelihood that the postsynaptic neuron will fire shortly after the presynaptic neuron. Each neuron receives the weighted outputs of every presynaptic neuron from the previous time-step and uses them to calculate the output, which functions as a prediction. The output of a neuron is calculated by comparing the sum of the weighted inputs to the neuron’s threshold, yielding a binary value.

Patterns of activity across the network are strengthened through repetition, increasing their likelihood of occurrence and thus their chances of being solidified further. Adaptation of this type in contrast to the corrective-based approach of supervised learning methods resembles a sort of evolutionary process. This is due to the fact that the initial conditions of an STDP network (i.e. the randomly assigned weights) massively dictate the final patterns of activity exhibited after training.

The patterns of activity that occur prior to training essentially guide future changes applied to the weights, simply because they happen to be the patterns produced by the random weights. Again, patterns strengthen and occur more reliably over time, therefore becoming stronger through a feedback loop that is the backbone of the self-regulation exhibited by STDP networks.

On top of that, competitive inhibition at the output layer ensures only a subset of neurons can be active at any given time. The inhibition of weakly activated neurons interferes with the adjustment process by preventing the variable associated with those neurons responsible for tracking the time since the last activation from resetting. This results in only the most responsive neurons updating while the rest proceed as if no activity occurred.

Because weights decay over time, inhibited neurons effectively ‘miss out’ on training and instead become weaker with respect to a given pattern. This furthers self-regulation by preventing output neurons from all conforming to the same activation pattern. Instead, the inhibited neurons are pushed away from the patterns that are sufficiently represented, driving them to find their own unique representations. Through this zero-sum game that occurs at the output layer, the range of coverage over the state space is maximized by the network, and in doing so the network optimizes itself to effectively model reality.