Designing synthetic regulatory elements using the generative AI framework DNA-Diffusion
Systematically designing regulatory elements for precise gene expression control remains a central challenge in genomics and synthetic biology. Here we introduce DNA-Diffusion, a generative artificial intelligence framework that uses machine learning trained on DNA accessibility data from diverse cell lines to design compact regulatory elements with cell-type-specific activity. We show that DNA-Diffusion generates 200-base-pair synthetic elements that recapitulate endogenous transcription factor binding grammar while exhibiting enhanced cell-type specificity. We validated these elements using a 5,850-element STARR-seq library across three cell lines. Moreover, we demonstrated successful endogenous gene modulation using EXTRA-seq, reactivating AXIN2, a leukemia-protective gene, in its native genomic context. Our approach outperforms existing computational methods in balancing functional activity with cell-type specificity while maintaining sequence diversity. This work establishes DNA-Diffusion as a powerful tool for engineering compact, highly specific regulatory elements crucial for advancing gene therapies and understanding gene regulation.
2-s2.0-105025707177
Harvard Medical School
Massachusetts General Hospital
École Polytechnique Fédérale de Lausanne
Harvard Medical School
Victor Chang Cardiac Rsch Institute
Department of Electrical Engineering and Computer Sciences
Massachusetts General Hospital
UNC School of Medicine
Independent Researcher
Université McGill
2025
REVIEWED
EPFL