Bayesian and Data-Efficient Strategies for the Optimization of Organic Reactions
Efficient optimization of organic reactions remains a central challenge in synthetic chemistry, often hindered by the vast combinatorial space of possible conditions and the limitations of traditional trial-and-error approaches. In this thesis, reaction optimization in homogeneous catalysis is advanced by integrating statistical and computational methodologies across four pillars: data collection and curation, molecular and reaction representation, robust modeling and model selection, and experimental design with a focus on Bayesian optimization. Emphasis is placed on strategies for low-data regimes, interpretability, and systematic decision-making under uncertainty, with outcomes demonstrated on representative case studies.
The first pillar establishes practical guidance for building statistically robust, machine-learning-ready datasets, with ongoing challenges in streamlining these processes discussed in detail. The second pillar develops a reaction-agnostic featurization strategy for bidentate ligands and curates ligand datasets to enable interpretable and transferable modeling. The third pillar introduces ReaFS (reaction feature selection) as a systematic methodology for selecting informative features and validating multivariate linear regression models, with a focus on model stability and interpretability in low-sample regimes. The fourth pillar extends classical Bayesian optimization by incorporating experimental cost into the acquisition strategy, resulting in cost-informed Bayesian optimization (CIBO) that reduces expected expenditure relative to cost-agnostic policies while respecting practical constraints. Application of selected methodologies is demonstrated on reaction scope exploration and optimization, including azidofunctionalization and carboetherification reactions. The thesis concludes by outlining future directions for integrating these tools with broader chemical synthesis planning, highlighting the need for standardized practices, rigorous validation, and continued methodological innovation.
EPFL_TH11571.pdf
Main Document
Not Applicable (or Unknown)
restricted
N/A
5.21 MB
Adobe PDF
4bfa57968dbcfcc95db013995540654b