Data Heterogeneity in Linear Distributed Estimation and Learning
One major challenge in distributed learning is to efficiently learn for each client when the data across clients is heterogeneous or non i.i.d (not independent or identically distributed). This provides a significant challenge as the data of the other clients may not be helpful to each individual client. Thus the following question arises - can each individual client's performance be improved with access to the data of other clients in this heterogeneous data setting? A further challenge is to have a good personalized model while still maintaining the privacy of local data samples.
We consider a model where the client data distributions are not identical and can be dependent. In this heterogeneous data setting we study the problem of distributed learning of ground truth parameters. Every client uses the same - possibly non-linear - estimation algorithm using their own local data to estimate its ground truth. In this model we propose a measure of data heterogeneity. We use this to propose a personalized combined linear estimator for each client. We show that this estimator is never worse and can be substantially better (up to a factor equal to the number of clients) than any chosen unbiased local estimator. We further show that this combined estimator still concentrates around the true value of the ground truth if the local estimator is unbiased. This estimator can be implemented by privacy-preserving schemes in both the cryptographic and differentially private settings.
EPFL_TH10547.pdf
Main Document
openaccess
N/A
990.51 KB
Adobe PDF
dc09ed55e75c94fca4a93564117b3b06