Generative structure-based design of synthetically accessible small molecules
Early-stage drug discovery has long relied on screening-based methods, which are inherently limited by the available libraries of chemical compounds. Even the largest available virtual screening libraries cover only a tiny fraction of the astronomically large space of all theoretically possible drug-like molecules.
In my thesis, I explore an alternative computational drug discovery strategy: generative structure-based drug design. Instead of searching in a limited collection of available molecules, we propose to use deep neural networks that learn the underlying distribution of the chemical data and can produce thousands of chemically novel, custom compounds for a specific protein in a matter of seconds.
The first part of my thesis is focused on fragment-based drug discovery, a rational design paradigm in which a drug candidate is constructed by parts starting from the potent molecular fragments. We introduce DiffLinker, an equivariant 3D-conditional diffusion model for molecular linker design, and show that it outperforms earlier linker design methods and can be applied to designing potent inhibitors for cancer and neurodegenerative disease targets.
To further explore the capabilities of molecular generative models, we address de novo design. To this end, we propose DrugFlow, a generative model for structure-based drug design that integrates multiple data domains, demonstrating state-of-the-art performance in learning chemical, geometric, and physical aspects of three-dimensional protein-ligand data. In this work, we primarily explore the ability of our model to learn the complex molecular data distribution and further align it with various relevant optimization objectives.
One of limitations of generative drug design, which hinders it from large-scale practical application, is low synthetic accessibility of designed molecules. Due to the high novelty of the generated molecules, only a small fraction of them can be easily synthesized and thus experimentally validated. To address this problem, we study a probabilistic formulation for the retrosynthesis modeling task and introduce RetroBridge, a template-free generative retrosynthesis approach that achieves state-of-the-art results on standard evaluation benchmarks and helps find synthetic routes for newly designed small molecules.
Finally, we develop LDDM (Large Drug Discovery Model), a unified generative platform from structure-based drug discovery. In addition to de novo drug design, LDDM is able to solve more constrained tasks including fragment-based design, and molecular docking. We computationally benchmark LDDM across various relevant tasks and experimentally validate LDDM-designed molecules for multiple therapeutically relevant targets.
Our findings establish a new paradigm in early-stage drug discovery, where a generative engine effectively explores a vast chemical space and proposes novel custom compounds for a target protein. The possibility to tailor the design process to a specific chemical subspace in accordance with available synthesis framework bridges computational drug design with large-scale experimental validation. This opens new frontiers in development of the next-generation computational methods with constant experimental feedback and lab in the loop.
EPFL_TH11616.pdf
Main Document
Not Applicable (or Unknown)
openaccess
N/A
76.04 MB
Adobe PDF
7b205265935060f2d28887d9edaff76e