Compilation and analysis of eukaryotic POL II promoter sequences

A representative set of 168 eukaryotic POL II promoters has been compiled from the EMBL library and subjected to computer signal search analysis. Application of this technique to E. coli promoters as a control ensemble revealed the well known consensus sequences at -35 and -10 which indicates that the methods are adequate to approach problems of this kind. The results obtained from the eukaryotic promoter set can be summarized as follows: Common sequence features are confined to a region between -50 and +10 relative to the transcriptional initiation site. The only well conserved consensus sequence is TATAAA, centered at -28. A weak motif, CA followed preferentially by pyrimidines, surrounds the cap-site. Two pentanucleotides which have been shown by experiments to stimulate transcription of certain genes, GGGCG and CCAAT, are moderately over-represented in the upstream region (between -129 and -50). However, they occur at highly variable distances from the initiation site.


