Loading...
conference paper
Gradient estimates of return distributions
2005
PASCAL Workshop on Principled Methods of Trading Exploration and Exploitation
We present a general method for maintaining estimates of the distribution of parameters in arbitrary models. This is then applied to the estimation of probability distributions over actions in value-based reinforcement learning. While this approach is similar to other techniques that maintain a confidence measure for action-values, it nevertheless offers an insight into current techniques and hints at potential avenues of further research.
Loading...
Name
dimitrakakis-pascal-2005.pdf
Access type
openaccess
Size
89.27 KB
Format
Adobe PDF
Checksum (MD5)
b12d493953db609b9b7041ef06c7fe63