Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Models of Reward-Modulated Spike-Timing-Dependent Plasticity
 
doctoral thesis

Models of Reward-Modulated Spike-Timing-Dependent Plasticity

Frémaux, Nicolas  
2013

How do animals learn to repeat behaviors that lead to the obtention of food or other “rewarding” objects? As a biologically plausible paradigm for learning in spiking neural networks, spike-timing dependent plasticity (STDP) has been shown to perform well in unsupervised learning tasks such as receptive field development. However, STDP fails to take behavioral relevance into account, and as such is inadequate to explain a vast range of learning tasks in which the final outcome, conditioned on the prior execution of a series of actions, is signaled to an animal through sparse rewards. In this thesis, I show that the addition of a third, global, reward-based factor to the pre- and postsynaptic factors of STDP is a promising solution to this problem, consistent with experimental findings. One one hand, dopamine is a neuromodulator which has been shown to encode reward signals in the brain. On the other hand, STDP has been shown to be affected by dopamine, even though the precise nature of the interaction is unclear. Moreover, the theoretical framework of reinforcement learning provides strong foundation for the analysis of these learning rules. After studying existing examples of such rules in a navigation task, I derive simple functional requirements for reward-modulated learning rules, and illustrate these in a motor learning task. One of those functional requirements is the existence a “critic” structure, constantly evaluating the potential for rewarding events. The implication of the existence of such a critic on the interpretation of psychophysical experiments are also discussed. Finally, I propose a biologically plausible implementation of such a structure, that performs motor or navigational tasks. This is based on a generalization of temporal difference learning, a well-known reinforcement learning framework, to continuous time, well suited to an implementation with spiking neurons. These result provide a unified picture of reward-modulated learning rules: even though different rules have been proposed, these can be reduced to a single model at the synaptic level, with variations in the computation of the neuromodulatory signal enabling switching between different learning rules.

  • Files
  • Details
  • Metrics
Type
doctoral thesis
DOI
10.5075/epfl-thesis-5683
Author(s)
Frémaux, Nicolas  
Advisors
Gerstner, Wulfram  
Jury

P. Fua (président), A. Ijspeert, Y. Loewenstein, W. Senn

Date Issued

2013

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2013-04-24

Thesis number

5683

Subjects

Spiking neurons

•

Synaptic plasticity

•

Spike-timing-dependent plasticity

•

Reinforcement learning

•

Neuromodulation

•

Dopamine

•

Reward

EPFL units
LCN  
Faculty
IC  
School
ISIM  
Doctoral School
EDIC  
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/92045
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés