Gradient estimates of return distributions

We present a general method for maintaining estimates of the distribution of parameters in arbitrary models. This is then applied to the estimation of probability distributions over actions in value-based reinforcement learning. While this approach is similar to other techniques that maintain a confidence measure for action-values, it nevertheless offers an insight into current techniques and hints at potential avenues of further research.


Published in:
PASCAL Workshop on Principled Methods of Trading Exploration and Exploitation
Presented at:
PASCAL Workshop on Principled Methods of Trading Exploration and Exploitation
Year:
2005
Keywords:
Note:
IDIAP-RR 05-29
Laboratories:




 Record created 2006-03-10, last modified 2018-03-17

n/a:
Download fulltextPDF
External links:
Download fulltextURL
Download fulltextRelated documents
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)