Data-Aware Privacy-Preserving Machine Learning

Triastcyn, Aleksei

doi:10.5075/epfl-thesis-7216

Triastcyn, Aleksei

2020

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

In this thesis, we focus on the problem of achieving practical privacy guarantees in machine learning (ML), where the classic differential privacy (DP) fails to maintain a good trade-off between user privacy and data utility. Differential privacy guarantee may be influenced by extreme outliers or samples outside of the data distribution to a large extent. For example, when trying to protect a classification model for magnetic resonance imaging (MRI), differentially private mechanisms would add the amount of noise sufficient to hide any image in the space of the same dimensionality. That includes images that do not belong to the intended data distribution (cars, houses, animals, and so on). Such generality inevitably yields poor privacy guarantees. Based on these observations and the ideas of DP, we propose a data-aware approach to privacy in machine learning. We design two novel privacy notions, Average-Case Differential Privacy (ADP) and Bayesian Differential Privacy (BDP), which allow to take into account the data distribution information and significantly improve the privacy-utility balance. First, we present average-case differential privacy, an empirical privacy notion designed for ex post privacy analysis of generative models and privacy-preserving data publishing. It relaxes the worst-case requirement of differential privacy to the average case and relies on empirical estimation to deal with undefined distributions. This notion can be regarded as a statistical sensitivity measure -- it measures the expected change in the model outcomes given a change in the inputs generated by an observed distribution. Second, we develop a more rigorous privacy notion, Bayesian differential privacy, based on the same high-level principle of probabilistic sensitivity measure. As the main theoretical contributions of this thesis, we formulate and prove basic properties of Bayesian DP, such as composition, group privacy, and resistance to post-processing, and we develop a novel privacy accounting method for iterative algorithms based on the advanced composition theorem. Furthermore, we show connections between our accountant and the well-known moments accountant, as well as between Bayesian DP and other privacy definitions. Our practical contributions and evaluation branch into three main areas: (1) privacy-preserving data release using generative adversarial networks (GANs); (2) private classification using convolutional neural networks and other ML models; and (3) private federated learning (FL) for both discriminative and generative models. We demonstrate that both notions allow to achieve considerably higher utility than differential privacy, and that Bayesian DP provides a superior trade-off between privacy guarantees and the output model quality in all settings.

Details

Title Data-Aware Privacy-Preserving Machine Learning

Author(s) Triastcyn, Aleksei

Advisor(s)

Faltings, Boi

Pagination 161

Date 2020

Publisher Lausanne, EPFL

Keywords

privacy-preserving machine learning; privacy-preserving data release; differential privacy; deep learning; federated learning; generative adversarial networks

Language English

DOI https://doi.org/10.5075/epfl-thesis-7216

Laboratories LIA

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > LIA - Artificial Intelligence Laboratory
Scientific production and competences > EPFL Theses
Work produced at EPFL
Published
Theses

Record creation date 2020-10-09

Files

Abstract

Details

PDF