Müller, KilianLaunay, JulienPoli, IacopoFilipovich, MatthewCapelli, AlessandroHesslow, DanielCarron, IgorDaudet, LaurentKrzakala, FlorentGigan, Sylvain2023-09-112023-09-112023-09-11202310.1109/CLEO/Europe-EQEC57999.2023.10231380https://infoscience.epfl.ch/handle/20.500.14299/200678Artificial Neural Networks (ANN) are habitually trained via the back-propagation (BP) algorithm. This approach has been extremely successful: Current models like GPT-3 have O(10 11 ) parameters, are trained on O(10 11 ) words and produce awe-inspiring results. However, there are good reasons to look for alternative training methods: With current algorithms and hardware constraints sometimes only half the available computing power is actually used. This is due to a complicated interplay between the size of the ANN, the available memory, throughput limitations of interconnects, the architecture of the network of computers, and the training algorithm. Training a model like the aforementioned GPT-3 takes months and costs millions. A different training paradigm, which could make clever use of specialized hardware, may train large ANNs more efficiently.Artificial Neural Network Training on an Optical Processor via Direct Feedback Alignmenttext::conference output::conference proceedings::conference paper