On the performance of random reshuffling in stochastic learning

In empirical risk optimization, it has been observed that gradient descent implementations that rely on random reshuffling of the data achieve better performance than implementations that rely on sampling the data randomly and independently of each other. Recent works have pursued justifications for this behavior by examining the convergence rate of the learning process under diminishing step-sizes. Some of these justifications rely on loose bounds, or their conclusions are dependent on the sample size which is problematic for large datasets. This work focuses on constant step-size adaptation, where the agent is continuously learning. In this case, convergence is only guaranteed to a small neighborhood of the optimizer albeit at a linear rate. The analysis establishes analytically that random reshuffling outperforms independent sampling by showing that the iterate at the end of each run approaches a smaller neighborhood of size O(μ2) around the minimizer rather than O(μ). Simulation results illustrate the theoretical findings.

Published in:
Information Theory and Applications Workshop (ITA), 1-5
Presented at:
Information Theory and Applications Workshop (ITA), San Diego, CA, USA, February 12-17, 2017

 Record created 2017-12-19, last modified 2018-12-03

Download fulltext

Rate this document:

Rate this document:
(Not yet reviewed)