Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. Towards the Transferability of Rewards Recovered Via Regularized Inverse Reinforcement Learning
 
conference paper

Towards the Transferability of Rewards Recovered Via Regularized Inverse Reinforcement Learning

Schlaginhaufen, Andreas  
•
Kamgarpour, Maryam  
Globerson, A
•
Mackey, L
Show more
January 1, 2024
Advances In Neural Information Processing Systems 37 (Neurips 2024)
38th Annual Conference on Neural Information Processing Systems

Inverse reinforcement learning (IRL) aims to infer a reward from expert demonstrations, motivated by the idea that the reward, rather than the policy, is the most succinct and transferable description of a task [Ng et al., 2000]. However, the reward corresponding to an optimal policy is not unique, making it unclear if an IRL-learned reward is transferable to new transition laws in the sense that its optimal policy aligns with the optimal policy corresponding to the expert's true reward. Past work has addressed this problem only under the assumption of full access to the expert's policy, guaranteeing transferability when learning from two experts with the same reward but different transition laws that satisfy a specific rank condition [Rolland et al., 2022]. In this work, we show that the conditions developed under full access to the expert's policy cannot guarantee transferability in the more practical scenario where we have access only to demonstrations of the expert. Instead of a binary rank condition, we propose principal angles as a more refined measure of similarity and dissimilarity between transition laws. Based on this, we then establish two key results: 1) a sufficient condition for transferability to any transition laws when learning from at least two experts with sufficiently different transition laws, and 2) a sufficient condition for transferability to local changes in the transition law when learning from a single expert. Furthermore, we also provide a probably approximately correct (PAC) algorithm and an end-to-end analysis for learning transferable rewards from demonstrations of multiple experts.

  • Details
  • Metrics
Type
conference paper
Web of Science ID

WOS:001633253000160

Author(s)
Schlaginhaufen, Andreas  

École Polytechnique Fédérale de Lausanne

Kamgarpour, Maryam  

École Polytechnique Fédérale de Lausanne

Editors
Globerson, A
•
Mackey, L
•
Belgrave, D
•
Fan, A
•
Paquet, U
•
Tomczak, J
•
Zhang, C
Date Issued

2024-01-01

Publisher

Neural Information Processing Systems (Nips)

Publisher place

La Jolla

Published in
Advances In Neural Information Processing Systems 37 (Neurips 2024)
ISBN of the book

979-8-3313-1438-5

Series title/Series vol.

Advances in Neural Information Processing Systems; 37

ISSN (of the series)

1049-5258

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
SYCAMORE  
Event nameEvent acronymEvent placeEvent date
38th Annual Conference on Neural Information Processing Systems

NeurIPS 2024

Vancouver Convention Center

2024-12-10 - 2024-12-15

FunderFunding(s)Grant NumberGrant URL

Swiss Data Science Center

Available on Infoscience
February 24, 2026
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/260692
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés