Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. Genetic Optimization of Training Sets for Improved Machine Learning Models of Molecular Properties
 
research article

Genetic Optimization of Training Sets for Improved Machine Learning Models of Molecular Properties

Browning, Nicholas J.  
•
Ramakrishnan, Rapunathan
•
Von Lilienfeld, O. Anatole  
Show more
2017
The Journal of Physical Chemistry Letters

The training of molecular models of quantum mechanical properties based on statistical machine learning requires large data sets which exemplify the map from chemical structure to molecular property. Intelligent a priori selection of training examples is often difficult or impossible to achieve, as prior knowledge may be unavailable. Ordinarily representative selection of training molecules from such data sets is achieved through random sampling. We use genetic algorithms for the optimization of training set composition consisting of tens of thousands of small organic molecules. The resulting machine learning models are considerably more accurate: in the limit of small training sets, mean absolute errors for out-of-sample predictions are reduced by up to similar to 75%. We discuss and present optimized training sets consisting of 10 molecular classes for all molecular properties studied. We show that these classes can be used to design improved training sets for the generation of machine learning models of the same properties in similar but unrelated molecular sets.

  • Files
  • Details
  • Metrics
Type
research article
DOI
10.1021/acs.jpclett.7b00038
Web of Science ID

WOS:000398884800005

Author(s)
Browning, Nicholas J.  
Ramakrishnan, Rapunathan
Von Lilienfeld, O. Anatole  
Roethlisberger, Ursula  
Date Issued

2017

Published in
The Journal of Physical Chemistry Letters
Volume

8

Issue

7

Start page

1351

End page

1359

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LCBC  
Available on Infoscience
May 30, 2017
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/137977
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés