Using reinforcement learning for optimizing the reproduction of tasks in robot programming by demonstration

Günter, Florent

doi:10.5075/epfl-thesis-4311

Günter, Florent

2009

Download

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

As robots start pervading human environments, the need for new interfaces that would simplify human-robot interaction has become more pressing. Robot Programming by Demonstration (RbD) develops intuitive ways of programming robots, taking inspiration in strategies used by humans to transmit knowledge to apprentices. The user-friendliness of RbD is meant to allow lay users with no prior knowledge in computer science, electronics or mechanics to train robots to accomplish tasks the same way as they would with a co-worker. When a trainer teaches a task to a robot, he/she shows a particular way of fulfilling the task. For a robot to be able to learn from observing the trainer, it must be able to learn what the task entails (i.e. answer the so-called "What-to-imitate?" question), by inferring the user's intentions. But most importantly, the robot must be able to adapt its own controller to fit at best the demonstration (the so-called "How-to-imitate?" question) despite different setups and embodiments. The latter is the question that interested us in this thesis. It relates to the problem of optimizing the reproduction of the task under environmental constraints. The "How-to-imitate?" question is subdivided into two problems. The first problem, also known as the "correspondence problem", relates to resolving the discrepancy between the human demonstrator and robot's body that prevent the robot from doing an identical reproduction of the task. Even though we helped ourselves by considering solely humanoid platforms, that is platforms that have a joint configuration similar to that of the human, discrepancies in the number of degrees of freedom and range of motion remained. We resolved these by exploiting the redundant information conveyed through the demonstrations by collecting data through different frames of reference. By exploiting these redundancies in an algorithm comparable to the damped least square algorithm, we are able to reproduce a trajectory that minimizes the error between the desired trajectory and the reproduced trajectory across each frame of reference. The second problem consists in reproducing a trajectory in an unknown setup while respecting the task constraints learned during training. When the information learned from the demonstration no longer suffice to generalize the task constraints to a new set-up, the robot must re-learn the task; this time through trial-and-error. Here we considered the combination of trial-and-error learning to complement RbD. By adding a trial-and-error module to the original Imitation Learning algorithm, the robot can find a solution that is more adapted to the context and to its embodiment than the solution found using RbD. Specifically, we compared Reinforcement Learning (RL) – to other classical optimization techniques. We show that the system is advantageous in that: a) learning is more robust to unexpected events that have not been encountered during the demonstrations and b) the robot is able to optimize its own model of the task according to its own embodiment.

Details

Title Using reinforcement learning for optimizing the reproduction of tasks in robot programming by demonstration

Author(s) Günter, Florent

Advisor(s)

Billard, Aude

Pagination 161

Date 2009

Publisher Lausanne, EPFL

Keywords

Robot Programming by Demonstration (RbD); Humanoid Robots; Imitation Learning; Trial-and-Error Learning; Reinforcement Learning (RL); programmation par démonstration; robots humanoïdes; apprentissage par imitation; apprentissage par renforcement

Language English

DOI https://doi.org/10.5075/epfl-thesis-4311

Other identifier(s) urn: urn:nbn:ch:bel-epfl-thesis4311-5

Laboratories LASA

Record Appears in Scientific production and competences > STI - School of Engineering > IEM - Institut d'Electricité et de Microtechnique > LASA - Learning algorithms and systems Laboratory
Scientific production and competences > EPFL Theses
Work produced at EPFL
Published
Theses

Record creation date 2008-12-11

Actions

Preview

Select file: