Robust Out-of-Distribution Prediction of Buchwald-Hartwig Reactions
The Buchwald–Hartwig cross-coupling is a cornerstone of modern pharmaceutical synthesis, yet predictive modeling of its outcomes remains limited by data quality and chemical space coverage. Industry electronic laboratory notebooks (ELNs) contain heterogeneous, noisy records, while open-source high-throughput experimentation (HTE) datasets are fragmented and narrow in scope. As a result, models often fail when applied to novel substrate and condition combinations. Here we introduce a unified framework that systematically standardizes and integrates diverse reaction data into a high-quality, unique-structure-per-entity dataset, coupled with active learning to strategically expand chemical space. By merging published Buchwald–Hartwig HTE data with new experimental results, we achieve models that generalize across substrates and conditions, delivering substantially improved out-of-distribution predictions relative to previous approaches. Crucially, model-guided reagent and condition recommendations were validated experimentally, confirming the framework’s utility for exploring unexplored reactivity. This work establishes a blueprint for robust machine learning in synthetic chemistry, with the potential to accelerate pharmaceutical discovery by enabling more reliable and scalable prediction of reaction outcomes.
robust-out-of-distribution-prediction-of-buchwald-hartwig-reactions.pdf
Main Document
Submitted version (Preprint)
openaccess
CC BY-NC-ND
1.2 MB
Adobe PDF
5e0a09f0ac76dfec4c40f7ce0ee994bc