General results from statistical learning theory suggest to understand not only brain computations, but also brain plasticity as probabilistic inference. But a model for that has been missing. We propose that inherently stochastic features of synaptic plasticity and spine motility enable cortical networks of neurons to carry out probabilistic inference by sampling from a posterior distribution of network configurations. This model provides a viable alternative to existing models that propose convergence of parameters to maximum likelihood values. It explains how priors on weight distributions and connection probabilities can be merged optimally with learned experience, how cortical networks can generalize learned information so well to novel experiences, and how they can compensate continuously for unforeseen disturbances of the network. The resulting new theory of network plasticity explains from a functional perspective a number of experimental data on stochastic aspects of synaptic plasticity that previously appeared to be quite puzzling.

Synaptic connectivity between neurons in the brain and the efficacies (“weights”) of these synaptic connections are thought to encode the long-term memory of an organism. But a closer look at their molecular implementation, as well as imaging experiments over longer periods of time, have shown that synaptic connections are subject to numerous stochastic processes. We propose that this seeming unreliability of synaptic connections is not a bug, but an important feature. It endows networks of neurons with an important experimentally observed but theoretically not understood capability: Automatic compensation for internal and external changes. This perspective of network plasticity requires a new conceptual and mathematical framework, which is provided by this article. Stochasticity of synapses is seen here not as noise of an inherently deterministic system, but as an inherent property, similarly as Brownian motion of particles in a physical system cannot be abstracted away if one wants to understand certain properties of a physical system. In fact, we find that this underlying stochasticity of synaptic connections enables a network of neurons to continuously try out new network configurations while maintaining its functionality.

We demonstrate the resulting new style of modeling network plasticity in three examples. These examples demonstrate how previously mentioned functional demands on network plasticity, such as incorporation of structural rules, automatic avoidance of overfitting, and inherent and immediate compensation for network perturbances, can be accomplished through stochastic local plasticity processes. We focus here on common models for unsupervised learning in networks of neurons: generative models. We first develop the general learning theory for this class of models, and then describe applications to common non-spiking and spiking generative network models. Both structural plasticity (see [ 10 , 11 ] for reviews) and synaptic plasticity (STDP) are integrated into the resulting theory of network plasticity.

This new model proposes to reexamine rules for synaptic plasticity. Rather than viewing trial-to-trial variability and ongoing fluctuations of synaptic parameters as the result of a suboptimal implementation of an inherently deterministic plasticity process, it proposes to model experimental data on synaptic plasticity by rules that consist of three terms: the standard (typically deterministic) activity-dependent (e.g., Hebbian or STDP) term that fits the model to external inputs, a second term that enforces structural rules (priors), and a third term that provides the stochastic driving force. This stochastic force enables network parameters to sample from the posterior, i.e., to fluctuate between different possible solutions of the learning task. The stochastic third term can be modeled by a standard formalism (stochastic Wiener process) that had been developed to model Brownian motion. The first two terms can be modeled as drift terms in a stochastic process. A key insight is that one can easily relate details of the resulting more complex rules for the dynamics of network parameters θ , which now become stochastic differential equations, to specific features of the resulting posterior distribution p*( θ ∣ x ) of parameter vectors θ from which the network samples. Thereby, this theory provides a new framework for relating experimentally observed details of local plasticity mechanisms (including their typically stochastic implementation on the molecular scale) to functional consequences of network learning. For example, one gets a theoretically founded framework for relating experimental data on spine motility to experimentally observed network properties, such as sparse connectivity, specific distributions of synaptic weights, and the capability to compensate against perturbations [ 9 ].

We therefore propose to view network plasticity as a process that continuously moves high-dimensional network parameters θ within some low-dimensional manifold that represents a compromise between overriding structural rules and different ways of fitting the internal model to external inputs x . We propose that ongoing stochastic fluctuations (not unlike Brownian motion) continuously drive network parameters θ within such low-dimensional manifold. The primary conceptual innovation is the departure from the traditional view of learning as moving parameters to values θ * that represent optimal (or locally optimal) fits to network inputs x . We show that our alternative view can be turned into a precise learning model within the framework of probability theory. This new model satisfies theoretical requirements for handling priors such as structural constraints and rules in a principled manner, that have previously already been formulated and explored in the context of artificial neural networks [ 6 , 7 ], as well as more recent challenges that arise from probabilistic brain models [ 8 ]. The low-dimensional manifold of parameters θ that becomes the new learning goal in our model can be characterized mathematically as the high probability regions of the posterior distribution p*( θ ∣ x ) of network parameters θ . This posterior arises as product of a general prior p 𝒮 ( θ ) for network parameters (that enforces structural rules) with a term that describes the quality of the current internal model (e.g. in a predictive coding or generative modeling framework: the likelihood p 𝒩 ( x ∣ θ ) of inputs x for the current parameter values θ of the network 𝒩). More precisely, we propose that brain plasticity mechanisms are designed to enable brain networks to sample from this posterior distribution p*( θ ∣ x ) through inherent stochastic features of their molecular implementation. In this way synaptic and other plasticity processes are able to carry out probabilistic (or Bayesian) inference through sampling from a posterior distribution that takes into account both structural rules and fitting to external inputs. Hence this model provides a solution to the challenge of [ 8 ] to understand how posterior distributions of weights can be represented and learned by networks of neurons in the brain.

Other experimental data point to surprising ongoing fluctuations in dendritic spines and spine volumes, to some extent even in the adult brain [ 1 ] and in the absence of synaptic activity [ 2 ]. Also a significant portion of axonal side branches and axonal boutons were found to appear and disapper within a week in adult visual cortex, even in the absence of imposed learning and lesions [ 3 ]. Furthermore surprising random drifts of tuning curves of neurons in motor cortex were observed [ 4 ]. Apart from such continuously ongoing changes in synaptic connections and tuning curves of neurons, massive changes in synaptic connectivity were found to accompany functional reorganization of primary visual cortex after lesions, see e.g. [ 5 ].

This view of network plasticity has been challenged on several grounds. From the theoretical perspective it is problematic because in the absence of an intelligent external controller it is likely to lead to overfitting of the internal model to the inputs x it has received, thereby reducing its capability to generalize learned knowledge to new inputs. Furthermore, networks of neurons in the brain are apparently exposed to a multitude of internal and external changes and perturbations, to which they have to respond quickly in order to maintain stable functionality.

We reexamine in this article the conceptual and mathematical framework for understanding the organization of plasticity in networks of neurons in the brain. We will focus on synaptic plasticity and network rewiring (spine motility) in this article, but our framework is also applicable to other network plasticity processes. One commonly assumes, that plasticity moves network parameters θ (such as synaptic connections between neurons and synaptic weights) to values θ * that are optimal for the current computational function of the network. In learning theory, this view is made precise for example as maximum likelihood learning, where model parameters θ are moved to values θ * that maximize the fit of the resulting internal model to the inputs x that impinge on the network from its environment (by maximizing the likelihood of these inputs x ). The convergence to θ * is often assumed to be facilitated by some external regulation of learning rates, that reduces the learning rate when the network approaches an optimal solution.

Results

We present a new theoretical framework for analyzing and understanding local plasticity mechanisms of networks of neurons in the brain as stochastic processes, that generate specific distributions p(θ) of network parameters θ over which these parameters fluctuate. This framework can be used to analyze and model many types of learning processes. We illustrate it here for the case of unsupervised learning, i.e., learning without a teacher or rewards. Obviously many learning processes in biological organisms are of this nature, especially learning processes in early sensory areas, and in other brain areas that have to provide and maintain on their own an adequate level of functionality, even in the face of internal or external perturbations.

A common framework for modeling unsupervised learning in networks of neurons are generative models, which date back to the 19th century, when Helmholtz proposed that perception could be understood as unconscious inference [12]. Since then the hypothesis of the “generative brain” has been receiving considerable attention, fueling interest in various aspects of the relation between Bayesian inference and the brain [8, 13, 14]. The basic assumption of the “Bayesian brain” theory is that the activity z of neuronal networks in the brain can be viewed as an internal model for hidden variables in the outside world that give rise to sensory experiences x (such as the response x of auditory sensory neurons to spoken words that are guessed by an internal model z). The internal model z is usually assumed to be represented by the activity of neurons in the network, e.g., in terms of the firing rates of neurons, or in terms of spatio-temporal spike patterns. A network 𝒩 of stochastically firing neuron is modeled in this framework by a probability distribution p 𝒩 (x,z∣θ) that describes the probabilistic relationships between N input patterns x = x1, …, xN and corresponding network responses z = z1, …, zN, where θ denotes the vector of network parameters that shape this distribution, e.g., via synaptic efficacies and network connectivity. The marginal probability p 𝒩 (x∣θ) = ∑ z p 𝒩 (x,z∣θ) of the actually occurring inputs x = x1, …, xN under the resulting internal model of the neural network 𝒩 with parameters θ can then be viewed as a measure for the agreement between this internal model (which carries out “predictive coding” [15]) and its environment (which generates the inputs x).

The goal of network learning is usually described in this probabilistic generative framework as finding parameter values θ* that maximize this agreement, or equivalently the likelihood of the inputs x (maximum likelihood learning): (1) Locally optimal parameter solutions are usually determined by gradient ascent on the data likelihood p 𝒩 (x∣θ).

Online synaptic sampling For online learning one assumes that the likelihood p 𝒩 (x∣θ) = p 𝒩 (x1, …, xN∣θ) of the network inputs can be factorized: (4) i.e., each network input xn can be explained as being drawn individually from p 𝒩 (xn∣θ), independently from other inputs. The weight update rule Eq (3) depends on all inputs x = x1, …, xN, hence synapses have to keep track of the whole set of all network inputs for the exact dynamics (batch learning). In an online scenario, we assume that only the current network input xn is available for synaptic sampling. One then arrives at the following online-approximation to Eq (3) (5) Note the additional factor N in the rule. It compensates for the N-fold summation of the first and last term in Eq (5) when one moves through all N inputs xn. Although convergence to the correct posterior distribution cannot be guaranteed theoretically for this online rule, we show in Methods that the rule is a reasonable approximation to the batch-rule Eq (3). Furthermore, all subsequent simulations are based on this online rule, which demonstrates the viability of this approximation.