Abstract

In this work, we study graph-based multi-arms bandit (MAB) problems aimed at optimizing actions on irregular and high-dimensional graphs. More formally, we consider a decision-maker that takes sequential actions over time and observes the experienced reward, defined as a function of a sparse graph signal. The goal is to optimize the action policy, which maximizes the reward experienced over time. The main challenges are represented by the system uncertainty (i.e., unknown parameters of the sparse graph signal model) and the high-dimensional search space. The uncertainty can be faced by online learning strategies that infer the system dynamics while taking the appropriate actions. However, the high-dimensionality makes online learning strategies highly inefficient. To overcome this limitation, we propose a novel graph-based MAB algorithm, which is data-efficient also in high-dimensional systems. The key intuition is to infer the nature of the graph processes by learning in the graph-spectral domain, and exploit this knowledge while optimizing the actions. In particular, we model the graph signal with a sparse dictionary-based representation and we propose an online sequential decision strategy that learns the parameters of the graph processes while optimizing the action strategy.

Details

Actions