Models of Reward-Modulated Spike-Timing-Dependent Plasticity

Frémaux, Nicolas

doi:10.5075/epfl-thesis-5683

Frémaux, Nicolas

2013

Télécharger

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Fichiers

Résumé

How do animals learn to repeat behaviors that lead to the obtention of food or other “rewarding” objects? As a biologically plausible paradigm for learning in spiking neural networks, spike-timing dependent plasticity (STDP) has been shown to perform well in unsupervised learning tasks such as receptive field development. However, STDP fails to take behavioral relevance into account, and as such is inadequate to explain a vast range of learning tasks in which the final outcome, conditioned on the prior execution of a series of actions, is signaled to an animal through sparse rewards. In this thesis, I show that the addition of a third, global, reward-based factor to the pre- and postsynaptic factors of STDP is a promising solution to this problem, consistent with experimental findings. One one hand, dopamine is a neuromodulator which has been shown to encode reward signals in the brain. On the other hand, STDP has been shown to be affected by dopamine, even though the precise nature of the interaction is unclear. Moreover, the theoretical framework of reinforcement learning provides strong foundation for the analysis of these learning rules. After studying existing examples of such rules in a navigation task, I derive simple functional requirements for reward-modulated learning rules, and illustrate these in a motor learning task. One of those functional requirements is the existence a “critic” structure, constantly evaluating the potential for rewarding events. The implication of the existence of such a critic on the interpretation of psychophysical experiments are also discussed. Finally, I propose a biologically plausible implementation of such a structure, that performs motor or navigational tasks. This is based on a generalization of temporal difference learning, a well-known reinforcement learning framework, to continuous time, well suited to an implementation with spiking neurons. These result provide a unified picture of reward-modulated learning rules: even though different rules have been proposed, these can be reduced to a single model at the synaptic level, with variations in the computation of the neuromodulatory signal enabling switching between different learning rules.

Détails

Titre Models of Reward-Modulated Spike-Timing-Dependent Plasticity

Auteur(s) Frémaux, Nicolas

Directeur(s)

Gerstner, Wulfram

Date 2013

Editeur Lausanne, EPFL

Mots-clés (libres)

Spiking neurons; Synaptic plasticity; Spike-timing-dependent plasticity; Reinforcement learning; Neuromodulation; Dopamine; Reward

Langue Anglais

DOI https://doi.org/10.5075/epfl-thesis-5683

Autres identifiant(s) urn: urn:nbn:ch:bel-epfl-thesis5683-4

Laboratoires LCN

Le document apparaît dans Production scientifique et compétences > SV - Faculté des sciences de la vie > BMI - Institut des neurosciences > LCN - Laboratoire de calcul neuromimétique (IC/SV)
Production scientifique et compétences > I&C - Faculté Informatique & Communications > IINFCOM > LCN - Laboratoire de calcul neuromimétique (IC/SV)
Production scientifique et compétences > Thèses EPFL
Travail produit à l'EPFL
Publié
Thèses

Date de création de la notice 2013-05-08

Fichiers

Résumé

Détails

Actions