Novel bioinformatics routines have been used to provide a more detailed definition of the proteome of Mycobacterium tuberculosis H37Rv. Over half of the current proteins result from gene duplication or domain shuffling events while one-sixth show no similarity to polypeptides described in other organisms. Prominent among the genes that appear to have been duplicated on numerous occasions are those involved in fatty acid metabolism, regulation of gene expression, and the unusually glycine-rich PE and PPE proteins. Protein similarity analysis, coupled with inspection of the genetic neighbourhood, was used to explore possible functional relatedness. This uncovered four large mce operons whose proteins may mediate initial interactions between the tubercle bacillus and host cells, together with a cluster of genes that might encode components of a structure required for secretion of ESAT-6 like proteins. Close linkage of the mmpL genes, encoding large membrane proteins, with those required for fatty acid metabolism suggests involvement in lipid transport. Compared to free-living bacteria, M. tuberculosis has a significantly smaller transport protein repertoire and this may reflect its intracellular lifestyle.