000199259 001__ 199259
000199259 005__ 20180913062523.0
000199259 020__ $$a978-1-4673-6358-7
000199259 022__ $$a2153-0858
000199259 02470 $$2ISI$$a000331367403046
000199259 037__ $$aCONF
000199259 245__ $$aTransfer in Inverse Reinforcement Learning for Multiple Strategies
000199259 260__ $$aNew York$$bIeee$$c2013
000199259 269__ $$a2013
000199259 300__ $$a7
000199259 336__ $$aConference Papers
000199259 490__ $$aIEEE International Conference on Intelligent Robots and Systems
000199259 520__ $$aWe consider the problem of incrementally learning different strategies of performing a complex sequential task from multiple demonstrations of an expert or a set of experts. While the task is the same, each expert differs in his/her way of performing it. We assume that this variety across experts' demonstration is due to the fact that each expert/strategy is driven by a different reward function, where reward function is expressed as a linear combination of a set of known features. Consequently, we can learn all the expert strategies by forming a convex set of optimal deterministic policies, from which one can match any unseen expert strategy drawn from this set. Instead of learning from scratch every optimal policy in this set, the learner transfers knowledge from the set of learned policies to bootstrap its search for new optimal policy. We demonstrate our approach on a simulated mini-golf task where the 7 degrees of freedom Barrett WAM robot arm learns to sequentially putt on different holes in accordance with the playing strategies of the expert.
000199259 700__ $$0246728$$aTanwani, Ajay Kumar$$g216104
000199259 700__ $$0240594$$aBillard, Aude$$g115671
000199259 7112_ $$aIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
000199259 720_1 $$aAmato, N.$$eed.
000199259 773__ $$q3244-3250$$t2013 Ieee/Rsj International Conference On Intelligent Robots And Systems (Iros)
000199259 909C0 $$0252119$$pLASA$$xU10660
000199259 909CO $$ooai:infoscience.tind.io:199259$$pconf$$pSTI
000199259 917Z8 $$x115671
000199259 937__ $$aEPFL-CONF-199259
000199259 973__ $$aEPFL$$rREVIEWED$$sPUBLISHED
000199259 980__ $$aCONF