Correlation analysis of amino acid usage in protein classes
We present a comparative study of residue usage correlations of various organism protein sets of diverse phylogenetic species and of open reading frames of several large human viral genomes. Our correlation analysis reveals three major tendencies: (i) charge compensation reflected by the high correlation of basic with acidic residues; (ii) the positive correlations of functionally and structurally similar amino acids including many pairs of hydrophobic amino acids, all pairs of aromatic amino acids, the anionic pair (glutamate and aspartate), but not the cationic pair (lysine and arginine), moderately the hydroxyl pair (serine and threonine), the small amino acids (glycine and alanine), and many (but not all) of those having high values in the Dayhoff substitutability matrix (characteristics such as amino acid polarity or codon usage agreement, except for the wobble position, do not necessarily imply significant positive correlations); (iii) a widespread negative correlation of the aggregate strong codon group amino acids (Ala, Gly, Pro) versus the weak codon group amino acids (Lys, Ile, Tyr, Asn, Phe). Discussion and speculations relate amino acid usage correlations to protein function/structure, cellular localization, proximity in amino acid biosynthetic pathways, amino acid relative abundances, tRNA and aminoacyl synthetase availabilities, and evolutionary processes.
Department of Mathematics, Stanford University, CA 94305.
Record created on 2007-12-17, modified on 2016-08-08