Robust Out-of-Distribution Prediction of Buchwald-Hartwig Reactions
The Buchwald–Hartwig cross-coupling is a cornerstone of modern pharmaceutical synthesis, yet predictive modeling of its outcomes remains limited by data quality and chemical space coverage. Industry electronic laboratory notebooks (ELNs) contain heterogeneous, noisy records, while open-source high-throughput experimentation (HTE) datasets are fragmented and narrow in scope. As a result, models often fail when applied to novel substrate and condition combinations. Here we introduce a unified framework that systematically standardizes and integrates diverse reaction data into a high-quality, unique-structure-per-entity dataset, coupled with active learning to strategically expand chemical space. By merging published Buchwald–Hartwig HTE data with new experimental results, we achieve models that generalize across substrates and conditions, delivering substantially improved out-of-distribution predictions relative to previous approaches. Crucially, model-guided reagent and condition recommendations were validated experimentally, confirming the framework’s utility for exploring unexplored reactivity. This work establishes a blueprint for robust machine learning in synthetic chemistry, with the potential to accelerate pharmaceutical discovery by enabling more reliable and scalable prediction of reaction outcomes.
Johnson & Johnson (United States)
Johnson & Johnson (United States)
Johnson & Johnson (United States)
Johnson & Johnson (United States)
École Polytechnique Fédérale de Lausanne
Johnson & Johnson (United States)
2025-10-13
American Chemical Society (ACS)
EPFL