Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Theoretical characterization of uncertainty in high-dimensional machine learning
 
doctoral thesis

Theoretical characterization of uncertainty in high-dimensional machine learning

Clarte, Lucas Andry  
2025

Uncertainty quantification is a crucial aspect of modern machine learning : for some applications, for instance medical diagnosis, assessing the confidence in one's prediction is as important (if not important) as having good accuracy. Yet, theoretical understanding of uncertainty quantification is still limited, partly due to the complexity of the models used in practice and the numerous methods. This thesis contributes to the theoretical understanding of uncertainty quantification in the context of supervised learning, by analyzing some popular methods in the context of high-dimensional statistics.This manuscript is divided into three parts, each focusing on a different approach to uncertainty quantification.

The first part, made of Chapters 2, 3 and 4, is devoted to classification : for classification tasks, uncertainty can be naturally quantified by the estimated probability of each class. The main problem is then to have a well-calibrated classifier, whose confidence matches the accuracy of the predictions. In this first part, we analyze the calibration of Bayesian and frequentist methods in the context of overparametrized models. In particular, we show that frequentist methods are as calibrated as Bayesian methods, for a fraction of the computation cost. In addition, the calibration of both methods is affected by the double-descent phenomenon. Lastly, we propose a new method to calibrate pre-trained neural networks, based on the commonly used temperature scaling algorithm, called Expectation Consistency. We evaluate our method on image classification tasks and show that it is more robust than temperature scaling to noisy labels.

Then, in Chapters 5 and 6, we analyze resampling methods for uncertainty quantification. This analysis is motivated by the observation that the Boostrap seems to fail in high dimensions, despite being a cornerstone of classical statistics. We show that even with proper regularization, bootstrap seems to fail in high-dimensions.

Motivated by the previous analysis of the bootstrap, we develop in Chapter 7 an algorithm to produce prediction intervals for regression based on full conformal prediction. Conformal prediction is a framework that allows to build valid prediction intervals with very few assumptions on the estimator and the data used.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

EPFL_TH11573.pdf

Type

Main Document

Version

Not Applicable (or Unknown)

Access type

openaccess

License Condition

N/A

Size

29.5 MB

Format

Adobe PDF

Checksum (MD5)

4f212949e22afc6cd5a0efad9dad6980

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés