Gradient estimates of return distributions

Dimitrakakis, Christos; Bengio, Samy

conference paper

Dimitrakakis, Christos

•

Bengio, Samy

2005

PASCAL Workshop on Principled Methods of Trading Exploration and Exploitation

We present a general method for maintaining estimates of the distribution of parameters in arbitrary models. This is then applied to the estimation of probability distributions over actions in value-based reinforcement learning. While this approach is similar to other techniques that maintain a confidence measure for action-values, it nevertheless offers an insight into current techniques and hints at potential avenues of further research.

Name

dimitrakakis-pascal-2005.pdf

Access type

openaccess

Size

89.27 KB

Format

Adobe PDF

Checksum (MD5)

b12d493953db609b9b7041ef06c7fe63