Re-annotation of the genome sequence of Mycobacterium tuberculosis H37Rv
Original genome annotations need to be regularly updated if the information they contain is to remain accurate and relevant. Here the complete re-annotation of the genome sequence of Mycobacterium tuberculosis strain H37Rv is presented almost 4 years after the first submission. Eighty-two new protein-coding sequences (CDS) have been included and 22 of these have a predicted function. The majority were identified by manual or automated re-analysis of the genome and most of them were shorter than the 100 codon cut-off used in the initial genome analysis. The functional classification of 643 CDS has been changed based principally on recent sequence comparisons and new experimental data from the literature. More than 300 gene names and over 1000 targeted citations have been added and the lengths of 60 genes have been modified. Presently, it is possible to assign a function to 2058 proteins (52% of the 3995 proteins predicted) and only 376 putative proteins share no homology with known proteins and thus could be unique to M. tuberculosis.