The Hidden Vulnerability of Distributed Learning in Byzantium

El Mhamdi, El Mahdi; Guerraoui, Rachid; Rouault, Sébastien Louis Alexandre

conference paper not in proceedings

El Mhamdi, El Mahdi

•

Guerraoui, Rachid

•

Rouault, Sébastien Louis Alexandre

2018

International Conference on Machine Learning

While machine learning is going through an era of celebrated success, concerns have been raised about the vulnerability of its backbone: stochastic gradient descent (SGD). Recent approaches have been proposed to ensure the robustness of distributed SGD against adversarial (Byzantine) workers sending \emph{poisoned} gradients during the training phase. Some of these approaches have been proven \emph{Byzantine--resilient}: they ensure the \emph{convergence} of SGD despite the presence of a minority of adversarial workers. We show in this paper that \emph{convergence is not enough}. In high dimension $d \gg 1$, an adver-sary can build on the loss function's non--convexity to make SGD converge to \emph{ineffective} models. More precisely, we bring to light that existing Byzantine--resilient schemes leave a \emph{margin of poisoning} of $\bigOmega\left(f(d)\right)$, where $f(d)$ increases at least like $\sqrt{d}$. Based on this \emph{leeway}, we build a simple attack, and experimentally show its strong to utmost effectivity on CIFAR--10 and MNIST. We introduce \emph{Bulyan}, and prove it significantly reduces the attacker's leeway to a narrow $\bigO,( \sfrac{1}{\sqrt{d~}})$ bound. We empirically show that Bulyan does not suffer the fragility of existing aggregation rules and, at a reasonable cost in terms of required batch size, achieves convergence \emph{as if} only non--Byzantine gradients had been used to update the model.

Name

bulyan.pdf

Access type

openaccess

License Condition

CC BY-NC-SA

Size

723.86 KB

Format

Adobe PDF

Checksum (MD5)

99fe3fb91fc98404c9effbd463d74eb4