Abstract

The increasing prevalence of personal devices motivates the design of algorithms that can leverage their computing power, together with the data they generate, in order to build privacy-preserving and effective machine learning models. However, traditional distributed learning algorithms impose a uniform workload on all participating devices, most often discarding the weakest participants. This not only induces a suboptimal use of available computational resources, but also significantly reduces the quality of the learning process, as data held by the slowest devices is discarded from the procedure.

This paper proposes HGO, a distributed learning scheme with parameterizable iteration costs that can be adjusted to the computational capabilities of different devices. HGO encourages the participation of slower devices, thereby improving the accuracy of the model when the participants do not share the same dataset. When combined with a robust aggregation rule, HGO can tolerate some level of Byzantine behavior, depending on the hardware profile of the devices (we prove, for the first time, a tradeoff between Byzantine tolerance and hardware heterogeneity). We also demonstrate the convergence of HGO, theoretically and empirically, without assuming any specific partitioning of the data over the devices. We present an exhaustive set of experiments, evaluating the performance of HGO on several classification tasks and highlighting the importance of incorporating slow devices when learning in a Byzantine-prone environment with heterogeneous participants.

Details

Actions