Alternation makes the adversary weaker in two-player games

Cevher, VolkanCutkosky, AshokKavis, AliPiliouras, GeorgiosSkoulakis, Efstratios PanteleimonViano, Luca2024-07-012024-07-012024-07-012023https://infoscience.epfl.ch/handle/20.500.14299/208913Motivated by alternating game-play in two-player games, we study an altenating variant of the Online Linear Optimization (OLO). In alternating OLO, a learner at each round t ∈[n] selects a vector xt and then an adversary selects a cost-vector ct ∈[−1,1]n. The learner then experiences cost (ct + ct−1)⊤xt instead of (ct)⊤xt as in standard OLO. We establish that under this small twist, the Ω(√T) lower bound on the regret is no longer valid. More precisely, we present two online learning algorithms for alternating OLO that respectively admit O((log n)4/3T1/3) regret for the n-dimensional simplex and O(ρlog T) regret for the ball of radius ρ > 0. Our results imply that in alternating game-play, an agent can always guarantee ̃O((log n)4/3T1/3) regardless the strategies of the other agent while the regret bound improves to O(log T) in case the agent admits only two actions.AI-MLAlternation makes the adversary weaker in two-player gamestext::conference output::conference paper not in proceedings