Cooperative off-policy prediction of Markov decision processes in adaptive networks

Macua, Sergio Valcarcel; Chen, Jianshu; Zazo, Santiago; Sayed, Ali H.

doi:10.1109/ICASSP.2013.6638519

conference paper

Cooperative off-policy prediction of Markov decision processes in adaptive networks

Macua, Sergio Valcarcel

•

Chen, Jianshu

•

Zazo, Santiago

2013

IEEE International Conference on Acoustics, Speech and Signal Processing

International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We apply diffusion strategies to propose a cooperative reinforcement learning algorithm, in which agents in a network communicate with their neighbors to improve predictions about their environment. The algorithm is suitable to learn off-policy even in large state spaces. We provide a mean-square-error performance analysis under constant step-sizes. The gain of cooperation in the form of more stability and less bias and variance in the prediction error, is illustrated in the context of a classical model. We show that the improvement in performance is especially significant when the behavior policy of the agents is different from the target policy under evaluation.

Type

conference paper

DOI

10.1109/ICASSP.2013.6638519

Author(s)

Macua, Sergio Valcarcel

Chen, Jianshu

Zazo, Santiago

Sayed, Ali H.

Date Issued

2013

Publisher

IEEE

Published in

IEEE International Conference on Acoustics, Speech and Signal Processing

Start page

4539

End page

4543

Editorial or Peer reviewed

REVIEWED

Written at

OTHER

EPFL units

ASL

Event name	Event place	Event date
International Conference on Acoustics, Speech and Signal Processing (ICASSP)	Vancouver, BC, Canada	May 26-31, 2013

Available on Infoscience

December 19, 2017

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/143340