Adaptive Gradient Descent without Descent

Malitsky, Yura; Mishchenko, Konstantin

conference paper

Malitsky, Yura

•

Mishchenko, Konstantin

2020

Proceedings of the 37th International Conference on Machine Learning (ICML) (2020)

37th International Conference on Machine Learning (ICML 2020)

We present a strikingly simple proof that two rules are sufficient to automate gradient descent: 1) don’t increase the stepsize too fast and 2) don’t overstep the local curvature. No need for functional values, no line search, no information about the function except for the gradients. By following these rules, you get a method adaptive to the local geometry, with convergence guarantees depending only on the smoothness in a neighborhood of a solution. Given that the problem is convex, our method converges even if the global smoothness constant is infinity. As an illustration, it can minimize arbitrary continuously twice differentiable convex function. We examine its performance on a range of convex and nonconvex problems, including logistic regression and matrix factorization.

Name

Adaptive Gradient.pdf

Type

Preprint

Version

http://purl.org/coar/version/c_71e4c1898caa6e32

Access type

openaccess

Size

850.25 KB

Format

Adobe PDF

Checksum (MD5)

20b6f6c91149a12891417dac467f485a

Name

ad_grad_icml.pdf

Type

Publisher's Version