Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Challenging the Assumptions: Rethinking Privacy, Bias, and Security in Machine Learning
 
doctoral thesis

Challenging the Assumptions: Rethinking Privacy, Bias, and Security in Machine Learning

Kulynych, Bogdan  
2023

Predictive models based on machine learning (ML) offer a compelling promise: bringing clarity and structure to complex natural and social environments. However, the use of ML poses substantial risks related to the privacy of their training data as well as the security and reliability of their operation. This thesis explores the relationships between privacy, security, and reliability risks of ML. Our research aims to re-evaluate the standard practices and approaches to mitigating and measuring these risks in order to understand their connections and scrutinize their effectiveness.

The first area we study is data privacy, particularly the standard privacy-preserving learning technique of differentially private (DP) training. DP training introduces controlled randomization to limit information leakage. This randomization has side effects such as performance loss and widening of performance disparities across population groups. In the thesis, we investigate additional side effects. On the positive side, we highlight the "What You See Is What You Get" property that DP training achieves. Models trained with standard methods often exhibit significant differences between training and testing phases, whereas privacy-preserving training guarantees similar behavior. Leveraging this property, we introduce competitive algorithms for group-distributionally robust optimization, addressing privacy-performance trade-offs, and mitigating robust overfitting. On the negative side, we show that decisions of DP-trained models can be arbitrary: due to the randomness in training, equally private models can yield drastically different predictions for the same input. We examine the costs of standard DP training algorithms in terms of arbitrariness, raising concerns about the justifiability of their decisions in high-stakes scenarios.

Next, we study the standard measure of privacy leakage: vulnerability of models to membership inference attacks. We analyze how the vulnerability to these attacks, thus privacy risks, are unequally distributed across the population groups. We emphasize the need and provide methods to consider privacy leakage across diverse subpopulations to avoid disproportionate harm and address inequities.

Finally, our study focuses on analyzing the security risks in tabular domains, which are commonly found in high-stakes ML settings. We challenge the assumptions behind existing security evaluation methods, which primarily consider threat models based on input geometry. We highlight that real-world adversaries in these settings face practical constraints, prompting the need for cost and utility-aware threat models. We propose a framework that tailors adversarial models to tabular domains, enabling the consideration of cost and utility constraints in high-stakes decision-making situations.

Overall, the thesis sheds light on the subtle effects of DP training, emphasizes the importance of diverse subpopulations in risk measurements, and highlights the need for realistic threat models and security measures. By challenging assumptions and re-evaluating risk mitigation and measurement approaches, the thesis paves the way for more robust and ethically grounded studies of ML risks.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

EPFL_TH9523.pdf

Type

N/a

Access type

openaccess

License Condition

copyright

Size

1.85 MB

Format

Adobe PDF

Checksum (MD5)

b69de25a263f31703638fe0aef43a8f4

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés