Multi-agent actor-critic with time dynamical opponent model

Tian, Yuan; Kladny, Klaus -Rudolf; Wang, Qin; Huang, Zhiwu; Fink, Olga

doi:10.1016/j.neucom.2022.10.045

Tian, Yuan; Kladny, Klaus -Rudolf; Wang, Qin; Huang, Zhiwu; Fink, Olga

2023

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

In multi-agent reinforcement learning, multiple agents learn simultaneously while interacting with a common environment and each other. Since the agents adapt their policies during learning, not only the behavior of a single agent becomes non-stationary, but also the environment as perceived by the agent. This renders it particularly challenging to perform policy improvement. In this paper, we propose to exploit the fact that the agents seek to improve their expected cumulative reward and introduce a novel Time Dynamical Opponent Model (TDOM) to encode the knowledge that the opponent policies tend to improve over time. We motivate TDOM theoretically by deriving a lower bound of the log objective of an individual agent and further propose Multi-Agent Actor-Critic with Time Dynamical Opponent Model (TDOM-AC). We evaluate the proposed TDOM-AC on a differential game and the Multi-agent Particle Environment. We show empirically that TDOM achieves superior opponent behavior prediction during test time. The proposed TDOM-AC methodology outperforms state-of-the-art Actor-Critic methods on the performed tasks in cooperative and especially in mixed cooperative-competitive environments. TDOM-AC results in a more stable training and a faster convergence. Our code is available at https:// github.com/Yuantian013/TDOM-AC.(c) 2022 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Details

Title Multi-agent actor-critic with time dynamical opponent model

Author(s) Tian, Yuan ; Kladny, Klaus -Rudolf ; Wang, Qin ; Huang, Zhiwu ; Fink, Olga

Published in Neurocomputing

Volume 517

Pages 165-172

Date 2023-01-14

ISSN 0925-2312
1872-8286

Keywords

reinforcement learning; multi -agent reinforcement learning; multi -agent systems; opponent modeling; non-stationarity; level

DOI https://doi.org/10.1016/j.neucom.2022.10.045

Other identifier(s) View record in Web of Science

Laboratories IMOS

Record Appears in Scientific production and competences > ENAC - School of Architecture, Civil and Environmental Engineering > IIC - Civil Engineering Institute > IMOS - Intelligent Maintenance and Operations Systems
Peer-reviewed publications
Work produced at EPFL
Journal Articles
Published

Record creation date 2023-01-16

Files

Abstract

Details

PDF