MolCraftDiffusion: 3D Molecular Generation Framework for Data-driven Molecular Applications
Equivariant diffusion models have recently been explored for 3D molecular structure generation, yet their practical deployment is hindered by two major challenges: the high cost of training and the lack of standardized guidance methods to steer generation toward desired geometries or properties. Existing guidance approaches are typically task-specific, limiting general applicability. We introduce MolCraftDiffusion, a generative AI framework for building and adapting molecular diffusion models to data-driven applications in computational chemistry. As part of the package, we implement curriculum learning, a progressive chemical complexity learning approach that we apply to construct a pre-trained diffusion model on 3D molecular structure datasets compiled from multiple sources. This pre-trained model circumvents the need for costly full training whenever applied to a different application. The framework also includes modular guidance diffusion methods to guide the generation toward chemically relevant problems, such as maintaining a specific 3D chemical structure and specific molecular properties. Structure guidance uses molecular inpainting (systematic exploration of structural variants around a reference molecule) and outpainting (extending existing molecules with new chemical groups). Target guidance employs gradient-based and classifier-free methods to direct generation toward molecules with desired physicochemical properties, enabling applications. We illustrate these guidance generation capabilities across various computational chemistry tasks such as virtual library construction and inverse molecular design. This framework is implemented as a comprehensive software package that aims to facilitate accessible adoption of 3D molecular generation models. The codebase, pre-trained models, and examples are available at: https://github.com/pregHosh/MolCraftDiffusion and https://huggingface.co/pregH/MolecularDiffusion.
mol-craft-diffusion-3d-molecular-generation-framework-for-data-driven-molecular-applications.pdf
Main Document
Submitted version (Preprint)
openaccess
CC BY-NC
16.39 MB
Adobe PDF
c3fa66b6e41e84a04739dd513611715e