STDP results from neural dynamics, and not from a specific optimality criterion
Jeffrey M. Beck1 and Lucas C. Parra2
1University of Rochester, 2City University of New York

Since the discovery of spike-time dependent plasticity (STDP) a variety of descriptive and normative explanations have been offered for its origin. Here, we show that STDP learning rules result from the stochastic dynamics of individual neurons and not from any specific optimality criterion. Specifically, we show that each iteration of a gradient descent learning rule can be written as a local update rule which takes the form of traditional STDP but with a magnitude given by reward, i.e. network weights are updated according to the covariance between reward and the STDP function. As a result, the overall shape of the STDP curve is not affected by the specific choice of reward function. Moreover, a global reward signal which modulates learning rate is sufficient to optimize a wide class of objective functions.

This result is obtained for the standard linear-nonlinear Poisson model, wherein presynaptic spikes are linearly filtered in space and time to generate a "membrane potential". The likelihood of firing at any moment in time is then a non-linear function of the instantaneous membrane potential, i.e. spiking follows an inhomogeneous Poisson process. This model readily implements refractory behavior by treating after-hyperpolarization as a linear response to the neuron's own spikes and is an excellent generalization of integrate-and-fire dynamics. Moreover, this model is analytically tractable. Indeed, it has been shown (Paninski) that for a given input/output spike pattern a maximum likelihood learning rule will converge to a unique set of network parameters (under certain conditions on the non-linearity). In our formalism this corresponds to supervised learning with the log likelihood as reward function. Thus, STDP can learn to maximize the likelihood of observing a specific input/output spike pattern relationship.

We then explore a number of possible reward functions by means of numerical simulations. These reward functions are motivated by concepts such as predictive encoding, independent component analysis, minimum quantization error, optimal linear decoding, mutual information maximization, description length minimization, and energy efficiency.