Stochastic Zeroth-Order Optimisation Algorithms with Variance Reduction

Introduction of optimisation problems in which the objective function is black box or obtaining the gradient is infeasible, has recently raised interest in zeroth-order optimisation methods. As an example finding adversarial examples for Deep Learning models (Chen et al. (2017); Moosavi-Dezfooli et al. (2016)) is one of the most common applications in which zeroth-order methods could be used. These optimisation methods use only function values at certain points to estimate the gradient. Most current approaches iteratively sample a random search direction along which they compute an estimation of the gradient (Nesterov and Spokoiny (2017); Conn et al. (2009); Wibisono et al. (2012)). However, due to the high variance in the search direction, these methods usually need d times more iterations than the standard gradient methods, where d is the dimensionality of the problem. So it seems that the main effort for improving the zeroth-order methods should be in reducing the variance of the gradient estimate. In this work we will analyse the gradient-free oracle which uses random directions sampled form a Gaussian distribution. Our analysis shows that in smooth and strongly convex setting, we have a convergence rate of O( d/T) which clearly shows the dependency to the dimension of the problem. Furthermore we propose some variance reduction methods to make the zeroth-order optimisation faster. We experiment our proposed methods in Python to compare their convergence in stochastic and non-stochastic setting. Our empirical results show that in a setting that number of allowed function evaluation is fixed, using a variance reduction method (e.g. momentum) can make the convergence of zeroth-order methods happen faster.

Stich, Sebastian Urban
Jaggi, Martin
Jun 21 2019
Additional link:

 Record created 2019-07-02, last modified 2019-09-17

Download fulltext

Rate this document:

Rate this document:
(Not yet reviewed)