Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. Multitask methods for predicting molecular properties from heterogeneous data
 
research article

Multitask methods for predicting molecular properties from heterogeneous data

Fisher, K. E.
•
Herbst, M. F.  
•
Marzouk, Y. M.
July 7, 2024
The Journal of Chemical Physics

Data generation remains a bottleneck in training surrogate models to predict molecular properties. We demonstrate that multitask Gaussian process regression overcomes this limitation by leveraging both expensive and cheap data sources. In particular, we consider training sets constructed from coupled-cluster (CC) and density functional theory (DFT) data. We report that multitask surrogates can predict at CC-level accuracy with a reduction in data generation cost by over an order of magnitude. Of note, our approach allows the training set to include DFT data generated by a heterogeneous mix of exchange-correlation functionals without imposing any artificial hierarchy on functional accuracy. More generally, the multitask framework can accommodate a wider range of training set structures—including the full disparity between the different levels of fidelity—than existing kernel approaches based on Δ-learning although we show that the accuracy of the two approaches can be similar. Consequently, multitask regression can be a tool for reducing data generation costs even further by opportunistically exploiting existing data sources.

  • Details
  • Metrics
Type
research article
DOI
10.1063/5.0201681
Scopus ID

2-s2.0-85197682945

PubMed ID

38958501

Author(s)
Fisher, K. E.

MIT School of Engineering

Herbst, M. F.  

École Polytechnique Fédérale de Lausanne

Marzouk, Y. M.

MIT School of Engineering

Date Issued

2024-07-07

Published in
The Journal of Chemical Physics
Volume

161

Issue

1

Article Number

014114

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
MATMAT1  
FunderFunding(s)Grant NumberGrant URL

NCCR MARVEL

Department of Energy

National Centre of Competence in Research

Show more
Available on Infoscience
January 24, 2025
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/243427
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés