Code-specific policy gradient rules for spiking neurons

Sprekeler, Henning; Hennequin, Guillaume; Gerstner, Wulfram

conference paper

Code-specific policy gradient rules for spiking neurons

Sprekeler, Henning

•

Hennequin, Guillaume

•

Gerstner, Wulfram

2009

Advances in Neural Information Processing Systems

Neural Information Processing Systems 22

Although it is widely believed that reinforcement learning is a suitable tool for describing behavioral learning, the mechanisms by which it can be implemented in networks of spiking neurons are not fully understood. Here, we show that different learning rules emerge from a policy gradient approach depending on which features of the spike trains are assumed to influence the reward signals, i.e., depending on which neural code is in effect. We use the framework of Williams (1992) to derive learning rules for arbitrary neural codes. For illustration, we present policy-gradient rules for three different example codes - a spike count code, a spike timing code and the most general “full spike train” code - and test them on simple model problems. In addition to classical synaptic learning, we derive learning rules for intrinsic parameters that control the excitability of the neuron. The spike count learning rule has structural similarities with established Bienenstock-Cooper-Munro rules. If the distribution of the relevant spike train features belongs to the natural exponential family, the learning rules have a characteristic shape that raises interesting prediction problems.

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/48987

Name

Sprekeler09.pdf

Access type

openaccess

Size

903.45 KB

Format

Adobe PDF

Checksum (MD5)

759fb9086ad1cc74bce7799ae3193ab4