Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Structure-Property Relationships in Complex Materials by Combining Supervised and Unsupervised Machine Learning
 
doctoral thesis

Structure-Property Relationships in Complex Materials by Combining Supervised and Unsupervised Machine Learning

Helfrecht, Benjamin Aaron  
2021

The work presented in this thesis combines supervised and unsupervised machine learning to examine structure-property relationships in databases of materials. While either supervised learning or unsupervised learning alone can be a powerful tool for assessing materials and their properties, the focus here is to demonstrate the utility of combining both supervised and unsupervised learning to gain actionable insight about complex materials, whether through a unified approach or in sequential workflows. To this end, the application of combined supervised-unsupervised learning schemes will be presented for two examples, each focusing on a different class of materials.

The first is an analysis of hydrogen bonding and backbone dihedral angle motifs in protein crystal structures from the Protein Data Bank, and demonstrates that data-driven definitions of structural motifs obtained through unsupervised learning can be more detailed and precise than conventional heuristics and can also be validated through supervised learning. We found that the motifs identified using a Gaussian mixture model largely agreed with more "traditional" definitions, but proved to be more precise for edge cases. Furthermore, we found that outside the more well-defined secondary structure motifs such as helices and sheets, several conventional secondary structure definitions did not coincide with the observed data-driven structural motifs, suggesting that the heuristic definitions corresponding to less-ordered secondary structure motifs do not strongly reflect the distribution of structural patterns in protein crystals in the Protein Data Bank. At the same time, there also exist clear, though as-yet unnamed motifs in the configuration space of proteins.

The second example centers around the exploration of structure-property relationships in all-silica zeolites, ultimately aiming to address the challenge of finding new zeolite frameworks that might be experimentally synthesizable. We begin by constructing a map of atom-centered environments in a database of hypothetical zeolite frameworks based on principal component analysis, where we validate our choice of "cardinal directions" by demonstrating that they correlate with the predicted atomic contributions to the molar volume and energy of the frameworks while emphasizing the diversity of the structural space. We extend this exploration of the structural space to a supervised classification exercise to distinguish hypothetical zeolite frameworks from those that have been experimentally synthesized, where frameworks that share several structural characteristics with synthesized frameworks are likely to be misclassified, and therefore may serve as promising synthesis candidates. To further filter the synthesis candidates based on their thermodynamic stability, we apply a convex hull construction based on a measure of classification prediction strength and the lattice energies of the zeolite frameworks. Through this combined supervised-unsupervised learning workflow we are able to propose a collection of hypothetical zeolites as likely candidates for experimental synthesis.

These two examples show that by combining supervised and unsupervised learning, it is possible to gain deeper insight into the structure-property relationships in a wide array of materials than through either set of methods in isolation.

  • Files
  • Details
  • Metrics
Type
doctoral thesis
DOI
10.5075/epfl-thesis-9032
Author(s)
Helfrecht, Benjamin Aaron  
Advisors
Ceriotti, Michele  
Jury

Prof. Véronique Michaud (présidente) ; Prof. Michele Ceriotti (directeur de thèse) ; Prof. Matteo Dal Peraro, Prof. Veronique Van Speybroeck, Prof. François-Xavier Coudert (rapporteurs)

Date Issued

2021

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2021-07-12

Thesis number

9032

Total of pages

154

Subjects

machine learning

•

supervised learning

•

unsupervised learning

•

structure--property relationships

•

hydrogen bonds

•

proteins

•

zeolites

EPFL units
COSMO  
Faculty
STI  
School
IMX  
Doctoral School
EDMX  
Available on Infoscience
July 5, 2021
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/179790
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés