The powerful combination of genomics and bioinformatics is providing a wealth of information about Mycobacterium tuberculosis, the aetiological agent of human tuberculosis, that will facilitate the conception and development of new therapies. The starting point for genome sequencing was the integrated map of the 4.4 Mb circular chromosome of the widely used, virulent reference strain, M. tuberculosis H37Rv. Cosmids and bacterial artificial chromosomes were selected from ordered libraries and subjected to systematic shotgun sequence analysis. This approach simplified sequence assembly as the genome is rich in repetitive DNA. In common with most bacteria, > 90% of the potential coding capacity is used, and probable or tentative functions could be attributed to > 70% of the genes. The potential biological roles of two of the principal driving forces in genome dynamics, insertion sequence elements and polymorphic multigene families are discussed.