Collaborative machine learning refers to a paradigm in which multiple participants jointly train a shared model using their respective local datasets, without exchanging or disclosing the raw data. This paradigm, which includes both server-mediated federated learning (FL) and serverless decentralized learning (DL), is critical for scaling machine learning in sensitive domains such as healthcare and finance. However, practical deployment faces two fundamental challenges: ensuring robustness against unreliable or malicious participants, often referred to as Byzantine adversaries, and achieving efficiency in terms of communication overhead and convergence speed. The distributed nature of these systems makes them vulnerable to faulty or adversarial behavior, while network and resource constraints limit scalability. This thesis aims to design algorithms that are both provably robust and practically efficient for large-scale collaborative learning.
To address this challenge in the widely adopted FL setting, we develop a unified theoretical analysis for robust and efficient federated learning. In practice, FL relies on client subsampling and multiple local updates to scale to large networks and reduce communication overhead. While essential for efficiency, these techniques complicate defense against Byzantine adversaries, as they can amplify the relative influence of attackers in a given round and induce benign client drift that is difficult to distinguish from malicious updates. We rigorously analyze the interplay between robust aggregation rules and these efficiency-driven mechanisms. Our analysis, which applies to a broad class of aggregators, provides novel convergence guarantees and offers key actionable insights. For instance, we show that beyond a certain threshold, sampling additional clients yields diminishing returns in terms of robustness.
For the more challenging serverless decentralized setting, this thesis introduces two novel algorithms. First, we present MoNNA, a provably robust algorithm that tolerates Byzantine failures with only linear gradient computation overhead, which is tight. MoNNA integrates Polyak's momentum for local updates with nearest-neighbor averaging (NNA) for global mixing, allowing honest nodes to effectively isolate and ignore malicious updates from their peers. Second, to address the efficiency challenge in decentralized learning, we propose Epidemic Learning (EL), an algorithm that leverages randomized, dynamic communication to accelerate convergence and improve scalability. By enabling nodes to communicate with randomly selected peers in each round, EL promotes rapid information diffusion throughout the network and overcomes the bottlenecks of static communication topologies, thereby providing a provably faster convergence guarantee.
Collectively, these contributions advance the theoretical and practical foundations of collaborative learning by addressing its core challenges in robustness and efficiency. Through rigorous analysis and algorithmic innovation, this work lays the groundwork for secure and scalable collaborative learning systems that are resilient to adversarial behavior and suitable for deployment in real-world, untrusted environments.
École Polytechnique Fédérale de Lausanne
Prof. Rüdiger Urbanke (président) ; Prof. Rachid Guerraoui (directeur de thèse) ; Prof. Patrick Thiran, Prof. Reza Shokri, Prof. Aurélien Bellet (rapporteurs)
2025
Lausanne
2025-11-21
10889
200