On the performance of random reshuffling in stochastic learning
In empirical risk optimization, it has been observed that gradient descent implementations that rely on random reshuffling of the data achieve better performance than implementations that rely on sampling the data randomly and independently of each other. Recent works have pursued justifications for this behavior by examining the convergence rate of the learning process under diminishing step-sizes. Some of these justifications rely on loose bounds, or their conclusions are dependent on the sample size which is problematic for large datasets. This work focuses on constant step-size adaptation, where the agent is continuously learning. In this case, convergence is only guaranteed to a small neighborhood of the optimizer albeit at a linear rate. The analysis establishes analytically that random reshuffling outperforms independent sampling by showing that the iterate at the end of each run approaches a smaller neighborhood of size O(μ2) around the minimizer rather than O(μ). Simulation results illustrate the theoretical findings.
ita_2017.pdf
Postprint
openaccess
465.21 KB
Adobe PDF
40a5ed6ddd7167a5c3da1b2891579a5d