System and method for privacy-preserving distributed training of machine learning models on distributed datasets

Froelicher, DavidTroncoso-Pastoriza, Juan RamónPyrgelis, ApostolosSav, SinemGomes de Sá e Sousa, Joao AndréHubaux, Jean-PierreBossuat, Jean-Philippe2021-12-032021-12-032021-12-032021https://infoscience.epfl.ch/handle/20.500.14299/183469A computer-implemented method and a distributed computer system (100) for privacy- preserving distributed training of a global model on distributed datasets (DS1 to DSn). The system has a plurality of data providers (DP1 to DPn) being communicatively coupled. Each data provider has a respective local model (LM1 to LMn) and a respective local training dataset (DS1 to DSn) for training the local model using an iterative training algorithm (IA). Further it has a portion of a cryptographic distributed secret key (SK1 to SKn) and a corresponding collective cryptographic public key (CPK) of a multiparty fully homomorphic encryption scheme, with the local and global model being encrypted with the collective public key. Each data provider (DP1) trains its local model (LM1) using the respective local training dataset (DS1) by executing gradient descent updates of its local model (LM1), and combining (1340) the updated local model (LM1') with the current global model (GM) into a current local model (LM1c). At least one data provider homomorphically combines at least a subset of the current local models of at least a subset of the data providers into a combined model (CM1), and updates the current global model (GM) based on the combined model. The updated global model is provided to at least a subset of the other data providers.System and method for privacy-preserving distributed training of machine learning models on distributed datasetspatentUS2023188319EP4136559CA3177895WO202122387370680502