Statistical learning quantifies transposable element-mediated cis-regulation
Background: Transposable elements (TEs) have colonized the genomes of most metazoans, and many TE-embedded sequences function as cis-regulatory elements (CREs) for genes involved in a wide range of biological processes from early embryo- genesis to innate immune responses. Because of their repetitive nature, TEs have the potential to form CRE platforms enabling the coordinated and genome-wide regulation of protein-coding genes by only a handful of trans-acting transcription fac- tors (TFs). Results: Here, we directly test this hypothesis through mathematical modeling and demonstrate that differences in expression at protein-coding genes alone are sufficient to estimate the magnitude and significance of TE-contributed cis-regulatory activities, even in contexts where TE-derived transcription fails to do so. We leverage hundreds of overexpression experiments and estimate that, overall, gene expression is influenced by TE-embedded CREs situated within approximately 500 kb of promot- ers. Focusing on the cis-regulatory potential of TEs within the gene regulatory network of human embryonic stem cells, we find that pluripotency-specific and evolutionarily young TE subfamilies can be reactivated by TFs involved in post-implantation embryo- genesis. Finally, we show that TE subfamilies can be split into truly regulatorily active versus inactive fractions based on additional information such as matched epigenomic data, observing that TF binding may better predict TE cis-regulatory activity than differ- ences in histone marks. Conclusion: Our results suggest that TE-embedded CREs contribute to gene regula- tion during and beyond gastrulation. On a methodological level, we provide a statisti- cal tool that infers TE-dependent cis-regulation from RNA-seq data alone, thus facilitat- ing the study of TEs in the next-generation sequencing era.
s13059-023-03085-7.pdf
publisher
openaccess
CC BY
4.52 MB
Adobe PDF
5676263fc4eccfdf24662ec001977a35