Gradient estimates of return distributions
We present a general method for maintaining estimates of the distribution of parameters in arbitrary models. This is then applied to the estimation of probability distributions over actions in value-based reinforcement learning. While this approach is similar to other techniques that maintain a confidence measure for action-values, it nevertheless offers an insight into current techniques and hints at potential avenues of further research.
- URL: http://publications.idiap.ch/downloads/papers/2005/dimitrakakis-pascal-2005.pdf
- Related documents: http://publications.idiap.ch/index.php/publications/showcite/dimitrakakis:rr05-29
Record created on 2006-03-10, modified on 2016-08-08