Loading...
The exploration-exploitation trade-off that arises when one considers simple point estimates of expected returns no longer appears when full distributions are considered. This work develops a simple gradient-based approach for mainting such distributions and investigates methods for using them to direct exploration.
Loading...
Name
dimitrakakis-idiap-rr-05-29.pdf
Access type
openaccess
Size
170.97 KB
Format
Adobe PDF
Checksum (MD5)
bf84b6bfb6af35518146a1ad79bf0ece