The exploration-exploitation trade-off that arises when one considers simple point estimates of expected returns no longer appears when full distributions are considered. This work develops a simple gradient-based approach for mainting such distributions and investigates methods for using them to direct exploration.
Type
report
Author(s)
Date Issued
2005
Publisher
IDIAP
Subjects
Note
Published in PASCAL Workshop in Principled Methods of Trading Exploration and Exploitation, London, UK, 2005
Written at
EPFL
EPFL units
Available on Infoscience
March 10, 2006
Use this identifier to reference this record