Cross-platform motif discovery and benchmarking to explore binding specificities of poorly studied human transcription factors
A sequence motif representing the DNA-binding specificity of a transcription factor (TF) is commonly modelled with a positional weight matrix (PWM). Focusing on understudied human TFs, we processed results of 4,237 experiments for 394 TFs, assayed using five different experimental platforms. By human curation, we approved a subset of experiments that yielded consistent motifs across platforms and replicates, and evaluated quantitatively the cross-platform performance of PWMs obtained with ten motif discovery tools. Notably, nucleotide composition and information content are not correlated with motif performance and do not help in detecting underperformers, while motifs with low information content, in many cases, describe well the binding specificity assessed across different experimental platforms. By combining multiple PMWs into a random forest, we demonstrate the potential of accounting for multiple modes of TF binding. Finally, we present the Codebook Motif Explorer ( https://mex.autosome.org ), cataloguing motifs, benchmarking results, and the underlying experimental data.
10.1038_s42003-025-08909-9.pdf
Main Document
Published version
openaccess
CC BY
5.23 MB
Adobe PDF
f8f8a3066a3f80fc194de8896e34c9f9