A new regret analysis for Adam-type algorithms

In this paper, we focus on a theory-practice gap for Adam and its variants (AMSgrad, AdamNC, etc.). In practice, these algorithms are used with a constant first-order moment parameter 1 (typically between 0:9 and 0:99). In theory, regret guarantees for online convex optimization require a rapidly decaying 1 ! 0 schedule. We show that this is an artifact of the standard analysis and propose a novel framework that allows us to derive optimal, data-dependent regret bounds with a constant 1, without further assumptions. We also demonstrate the flexibility of our analysis on a wide range of different algorithms and settings.


Published in:
Proceedings of the 37th International Conference on Machine Learning (ICML)
Presented at:
37th International Conference on Machine Learning (ICLM 2020), Virtual, July 12-18, 2020
Year:
2020
Keywords:
Laboratories:


Note: The status of this file is: Anyone


 Record created 2020-06-15, last modified 2020-07-14

PREPRINT:
Download fulltext
PDF

Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)