Practical Byzantine-resilient Stochastic Gradient Descent

Rouault, Sébastien Louis Alexandre

doi:10.5075/epfl-thesis-8999

doctoral thesis

Practical Byzantine-resilient Stochastic Gradient Descent

2022

Algorithms are everywhere. The recipe for the frangipane cake is an algorithm. If all the listed ingredients are available and the cook is sufficiently deft, after a finite number of small, well-defined steps a delicious dessert will exit the oven. Now, what if the grocer who sold the cook the ingredients mislabeled some of them? If salt had been mistakenly labeled and sold as sugar, those tasting the cake would probably be disgusted and stop eating. If a dangerous drug had been maliciously sold as sugar, the consequences could be much more awful.

When it comes to security, machine learning is perhaps closer to cooking than it is to other families of computer programs. Follow the recipe for the Margherita pizza using cream and onions instead of tomatoes and mozzarella, and you may pull a Flammekueche out of the oven. Likewise, what the practitioner gets at the end of the training certainly depends on the relevance of the model and other hyperparameters, but ultimately the output parameters are defined by the inputs: the training set. While for cookery we simply trust the supply chain and blame the cook for mistakes, we should expect machine learning algorithms to gracefully handle (partially) malformed and malicious datasets and inputs, as for any other computer program.

Byzantine machine learning studies malicious behaviors during the training phase. A particular, growing body of work has been tackling Byzantine failures in the context of distributed Stochastic Gradient Descent (SGD). A central server distributes and safely aggregates gradients from several worker nodes, some of them being adversarial.

On the theoretical side, this thesis tries to advance the state-of-the-art on two fronts. Existing works had always considered a trusted, central parameter server. We propose a novel algorithm that has the same proven guarantees as previous works, but does not require any node to be trusted and can operate under network asynchrony. Two other contributions tackle problems either in the construction or an assumption common to statistically-robust Gradient Aggregation Rule (GAR), that made them very vulnerable to attacks in practice. We put a substantial emphasis on devising pragmatic, easy to implement and computationally cheap techniques to address these issues.

On the system side, this thesis implements many of the existing and contributed GARs, integrated into two major machine learning frameworks (PyTorch and TensorFlow). These algorithms are assesses from microbenchmarks on a single GPU to actual networked deployments on many nodes. The associated code has been open-sourced.

Type

doctoral thesis

DOI

10.5075/epfl-thesis-8999

Author(s)

Rouault, Sébastien Louis Alexandre

Advisors

Guerraoui, Rachid

Jury

Prof. Viktor Kuncak (président) ; Prof. Rachid Guerraoui (directeur de thèse) ; Prof. Martin Jaggi, Dr. Matthias Seeger, Dr. Miguel Ángel Veganzones (rapporteurs)

Date Issued

2022

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2022-02-24

Thesis number

8999

Total of pages

143

Subjects

Machine Learning

•

Byzantine Fault

•

Distributed

•

Decentralized

•

Stochastic Gradient Descent

•

CUDA

•

PyTorch

•

TensorFlow

EPFL units

Faculty

School

Doctoral School

Available on Infoscience

February 10, 2022

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/185240