Preconditioned Spectral Descent for Deep Learning
Deep learning presents notorious computational challenges. These challenges in- clude, but are not limited to, the non-convexity of learning objectives and estimat- ing the quantities needed for optimization algorithms, such as gradients. While we do not address the non-convexity, we present an optimization solution that exploits the so far unused “geometry” in the objective function in order to best make use of the estimated gradients. Previous work attempted similar goals with precon- ditioned methods in the Euclidean space, such as L-BFGS, RMSprop, and ADA- grad. In stark contrast, our approach combines a non-Euclidean gradient method with preconditioning. We provide evidence that this combination more accurately captures the geometry of the objective function compared to prior work. We theo- retically formalize our arguments and derive novel preconditioned non-Euclidean algorithms. The results are promising in both computational time and quality when applied to Restricted Boltzmann Machines, Feedforward Neural Nets, and Convolutional Neural Nets.
nips_spectral_info.pdf
openaccess
409.35 KB
Adobe PDF
666de1baf7b00c0c8a4b21fbf1ffd1a0
nips_spectral_supplement_info.pdf
openaccess
222.46 KB
Adobe PDF
52e8051baf0a5b846bdb1838e0b6a282