Fast and Robust Distributed Learning in High Dimension

El-Mhamdi, El-Mahdi; Guerraoui, Rachid; Rouault, Sebastien

doi:10.1109/SRDS51746.2020.00015

conference paper

Fast and Robust Distributed Learning in High Dimension

El-Mhamdi, El-Mahdi

•

Guerraoui, Rachid

•

Rouault, Sebastien

September 21, 2020

2020 International Symposium on Reliable Distributed Systems (SRDS)

IEEE 39th International Symposium on Reliable Distributed Systems (SRDS 2020)

Could a gradient aggregation rule (GAR) for distributed machine learning be both robust and fast? This paper answers by the affirmative through Multi-Bulyan. Given n workers, f of which are arbitrary malicious (Byzantine) and m = n − f are not, we prove that Multi-Bulyan can ensure a strong form of Byzantine resilience, as well as an m / n slowdown, compared to averaging, the fastest (but non Byzantine resilient) rule for distributed machine learning. When m ≈ n (almost all workers are correct), Multi-Bulyan reaches the speed of averaging. We also prove that Multi-Bulyan's cost in local computation is O(d) (like averaging), an important feature for ML where d commonly reaches 10⁹, while robust alternatives have at least quadratic cost in d. Our theoretical findings are complemented with an experimental evaluation which, in addition to supporting the linear O(d) complexity argument, conveys the fact that Multi-Bulyan's parallelisability further adds to its efficiency.

Name

srds20-paper.pdf

Type

Postprint

Version

Accepted version

Access type

openaccess

License Condition

n/a

Size

1.9 MB

Format

Adobe PDF

Checksum (MD5)

b29fb18534d8705a274fe77495121fd1