Since the discovery of spike-time dependent plasticity (STDP) a variety of descriptive and normative explanations have been offered for its origin. Here, we show that STDP learning rules result from the stochastic dynamics of individual neurons and not from any specific optimality criterion. Specifically, we show that each iteration of a gradient descent learning rule can be written as a local update rule which takes the form of traditional STDP but with a magnitude given by reward, i.e. network weights are updated according to the covariance between reward and the STDP function. As a result, the overall shape of the STDP curve is not affected by the specific choice of reward function. Moreover, a global reward signal which modulates learning rate is sufficient to optimize a wide class of objective functions.
This result is obtained for the standard linear-nonlinear Poisson model, wherein presynaptic spikes are linearly filtered in space and time to generate a "membrane potential". The likelihood of firing at any moment in time is then a non-linear function of the instantaneous membrane potential, i.e. spiking follows an inhomogeneous Poisson process. This model readily implements refractory behavior by treating after-hyperpolarization as a linear response to the neuron's own spikes and is an excellent generalization of integrate-and-fire dynamics. Moreover, this model is analytically tractable. Indeed, it has been shown (Paninski) that for a given input/output spike pattern a maximum likelihood learning rule will converge to a unique set of network parameters (under certain conditions on the non-linearity). In our formalism this corresponds to supervised learning with the log likelihood as reward function. Thus, STDP can learn to maximize the likelihood of observing a specific input/output spike pattern relationship.
We then explore a number of possible reward functions by means of
numerical simulations. These reward functions are motivated by concepts
such as predictive encoding, independent component analysis, minimum
quantization error, optimal linear decoding, mutual information
maximization, description length minimization, and energy efficiency.