Recognizing molecular patterns by machine learning: An agnostic structural definition of the hydrogen bond
The concept of chemical bonding can ultimately be seen as a rationalization of the recurring structural patterns observed in molecules and solids. Chemical intuition is nothing but the ability to recognize and predict such patterns, and how they transform into one another. Here, we discuss how to use a computer to identify atomic patterns automatically, so as to provide an algorithmic definition of a bond based solely on structural information. We concentrate in particular on hydrogen bonding a central concept to our understanding of the physical chemistry of water, biological systems, and many technologically important materials. Since the hydrogen bond is a somewhat fuzzy entity that covers a broad range of energies and distances, many different criteria have been proposed and used over the years, based either on sophisticate electronic structure calculations followed by an energy decomposition analysis, or on somewhat arbitrary choices of a range of structural parameters that is deemed to correspond to a hydrogen-bonded configuration. We introduce here a definition that is univocal, unbiased, and adaptive, based on our machine-learning analysis of an atomistic simulation. The strategy we propose could be easily adapted to similar scenarios, where one has to recognize or classify structural patterns in a material or chemical compound. (C) 2014 AIP Publishing LLC.