Cooperative off-policy prediction of Markov decision processes in adaptive networks

Macua, Sergio ValcarcelChen, JianshuZazo, SantiagoSayed, Ali H.2017-12-192017-12-192017-12-19201310.1109/ICASSP.2013.6638519https://infoscience.epfl.ch/handle/20.500.14299/143340We apply diffusion strategies to propose a cooperative reinforcement learning algorithm, in which agents in a network communicate with their neighbors to improve predictions about their environment. The algorithm is suitable to learn off-policy even in large state spaces. We provide a mean-square-error performance analysis under constant step-sizes. The gain of cooperation in the form of more stability and less bias and variance in the prediction error, is illustrated in the context of a classical model. We show that the improvement in performance is especially significant when the behavior policy of the agents is different from the target policy under evaluation.Cooperative off-policy prediction of Markov decision processes in adaptive networkstext::conference output::conference proceedings::conference paper