Alternation makes the adversary weaker in two-player games
Motivated by alternating game-play in two-player games, we study an altenating variant of the Online Linear Optimization (OLO). In alternating OLO, a learner at each round t ∈[n] selects a vector xt and then an adversary selects a cost-vector ct ∈[−1,1]n. The learner then experiences cost (ct + ct−1)⊤xt instead of (ct)⊤xt as in standard OLO. We establish that under this small twist, the Ω(√T) lower bound on the regret is no longer valid. More precisely, we present two online learning algorithms for alternating OLO that respectively admit O((log n)4/3T1/3) regret for the n-dimensional simplex and O(ρlog T) regret for the ball of radius ρ > 0. Our results imply that in alternating game-play, an agent can always guarantee ̃O((log n)4/3T1/3) regardless the strategies of the other agent while the regret bound improves to O(log T) in case the agent admits only two actions.
alternated_OCO_neurips2023.pdf
postprint
openaccess
copyright
351.98 KB
Adobe PDF
3f854631af28292a60e4c883df551d80