Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. Multi-agent actor-critic with time dynamical opponent model
 
research article

Multi-agent actor-critic with time dynamical opponent model

Tian, Yuan
•
Kladny, Klaus -Rudolf
•
Wang, Qin
Show more
January 14, 2023
Neurocomputing

In multi-agent reinforcement learning, multiple agents learn simultaneously while interacting with a common environment and each other. Since the agents adapt their policies during learning, not only the behavior of a single agent becomes non-stationary, but also the environment as perceived by the agent. This renders it particularly challenging to perform policy improvement. In this paper, we propose to exploit the fact that the agents seek to improve their expected cumulative reward and introduce a novel Time Dynamical Opponent Model (TDOM) to encode the knowledge that the opponent policies tend to improve over time. We motivate TDOM theoretically by deriving a lower bound of the log objective of an individual agent and further propose Multi-Agent Actor-Critic with Time Dynamical Opponent Model (TDOM-AC). We evaluate the proposed TDOM-AC on a differential game and the Multi-agent Particle Environment. We show empirically that TDOM achieves superior opponent behavior prediction during test time. The proposed TDOM-AC methodology outperforms state-of-the-art Actor-Critic methods on the performed tasks in cooperative and especially in mixed cooperative-competitive environments. TDOM-AC results in a more stable training and a faster convergence. Our code is available at https:// github.com/Yuantian013/TDOM-AC.(c) 2022 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

  • Files
  • Details
  • Metrics
Type
research article
DOI
10.1016/j.neucom.2022.10.045
Web of Science ID

WOS:000884436700013

Author(s)
Tian, Yuan
Kladny, Klaus -Rudolf
Wang, Qin
Huang, Zhiwu
Fink, Olga  
Date Issued

2023-01-14

Published in
Neurocomputing
Volume

517

Start page

165

End page

172

Subjects

Computer Science, Artificial Intelligence

•

Computer Science

•

reinforcement learning

•

multi -agent reinforcement learning

•

multi -agent systems

•

opponent modeling

•

non-stationarity

•

level

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
IMOS  
Available on Infoscience
January 16, 2023
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/193811
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés