Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Kernel Methods and Similarity Learning Applied in Computational Chemistry
 
doctoral thesis

Kernel Methods and Similarity Learning Applied in Computational Chemistry

Fabregat I De Aguilar-Amat, Raimon  
2022

Over the last two decades, data-powered machine learning (ML) tools have profoundly transformed numerous scientific fields. In computational chemistry, machine learning applications have permitted faster predictions of chemical properties and provided powerful analytical tools, facilitating the exploration of the chemical space. The original work presented in this thesis leverages the paradigm-shifting influence of ML and focuses on bridging the divide between unsupervised and supervised learning with the overarching objective of improving the predictive power of similarity-based machine learning algorithms such as kernel regression.
Despite their widespread use in chemistry, current implementations of kernel regression suffer from biased definitions of similarity between chemical environments. This problem originates from the rigidity of current numerical approaches for encoding molecular information, based on expert-crafted representations. Moreover, it is amplified by the incorrect (yet generalized) assumption that increasing the amount of information encoded in molecular representations unequivocally improves the evaluation of molecular similarity. As a result, the performance of kernel models can be sub-optimal reducing their broad applicability.
To overcome such limitations, we introduce a series of statistical tools and methodologies based on supervised dimensionality reduction and metric learning capable of filtering and adapting the features of common molecular representations. This allows tailoring the notion of "molecular similarity" in order to optimize the prediction of specific chemical targets.
Using examples such as the exploration of the free-energy landscape of oligopeptides or the prediction of subtle properties associated with the outcome of chemical reactions (for example, enantiomeric excess), we demonstrate how the methods proposed in this thesis unlock the optimal performance of kernel regression and, more generally, of any similarity-based algorithm.
Overall, the work within is part of a larger, more comprehensive effort aimed at extending the capabilities of computational modeling to increasingly complex chemical situations by exploiting the latest advances in statistical learning.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

EPFL_TH9317.pdf

Type

N/a

Access type

openaccess

License Condition

Copyright

Size

57.98 MB

Format

Adobe PDF

Checksum (MD5)

464adf7813305df4cf69287cd5a01f21

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés