Files

Abstract

Discovering new materials is essential but challenging, time-consuming, and expensive. In many cases, simulations can be useful for estimating material properties. For many of the most interesting properties, however, simulations are infeasible because of prohibitive costs or because it is unknown how to set up a suitable simulation. A promising alternative to reduce the cost of predicting material properties—or to make estimates possible in the first place—is to learn mappings from materials or processes to properties from data. While the feasibility of this approach finds support in the observation that chemists learn from experience (intuition), such a data-intensive approach has unique challenges. First, it relies on suitable data to enable the research. Second, appropriate tooling is needed to enable a scientific research approach in which research findings can be easily compared and reused. A third challenge is that crystal structures often must be first converted into suitable inputs for machine learning models (so-called featurization). In the first part of this thesis, we present tools that address all these challenges. Using open-source electronic lab notebooks, we capture data in a machine-actionable form. We then show how to use tooling from our “ecosystem for reticular chemistry,” which provides datasets, data splitters, featurizers, and benchmark utilities, to build, compare and publish machine learning models. A subsequent chapter then highlights how such machine-learning models can guide which experiments or simulations to perform next, particularly in the multiobjective setting, which is relevant for most material design problems. The second part of this thesis uses tools from this toolbox for data-driven research to address problems from the atom to the pilot plant scale using a data-driven approach. On the atom scale, we show that chemically sensible features can be used to predict oxidation states of metal cations—a property at the heart of chemistry but not a quantum-mechanical observable. On the pilot plant scale, we address how a carbon-capture plant's operation impacts the capture solvent emissions. Surprisingly, this has been an open question since the process is so complex that it is not known how to set up corresponding process simulations. As in the case of the oxidation states, an inductive, data-driven approach is not constrained by this and, therefore, could give us insights into how the solvent emissions behave as a function of the operating conditions. One underlying theme of the work presented in this thesis is that it is not computational chemists but their experimental colleagues that could benefit most from predictions enabled by machine-learning models. One fascinating development that might help in making machine learning more accessible are so-called foundation models. The closing chapter shows that such models can be fine-tuned with a few examples to give competitive performance across many chemistry and material science tasks. However, most models are black boxes, and combining them with experienced chemists' reasoning and even more background knowledge will likely yield the most progress. Combined with the progress thus far, this indicates that machine learning might have a larger impact on chemistry than in many other domains, such as computer vision.

Details

PDF