Gradient estimates of return
The exploration-exploitation trade-off that arises when one considers simple point estimates of expected returns no longer appears when full distributions are considered. This work develops a simple gradient-based approach for mainting such distributions and investigates methods for using them to direct exploration.
Published in PASCAL Workshop in Principled Methods of Trading Exploration and Exploitation, London, UK, 2005
Record created on 2006-03-10, modified on 2016-08-08