Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence

Cole, S T; Brosch, R; Parkhill, J; Garnier, T; Churcher, C; Harris, D; Gordon, S V; Eiglmeier, K; Gas, S; Barry, C E; Tekaia, F; Badcock, K; Basham, D; Brown, D; Chillingworth, T; Connor, R; Davies, R; Devlin, K; Feltwell, T; Gentles, S; Hamlin, N; Holroyd, S; Hornsby, T; Jagels, K; Krogh, A; McLean, J; Moule, S; Murphy, L; Oliver, K; Osborne, J; Quail, M A; Rajandream, M A; Rogers, J; Rutter, S; Seeger, K; Skelton, J; Squares, R; Squares, S; Sulston, J E; Taylor, K; Whitehead, S; Barrell, B G

doi:10.1038/31159

1998

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Abstract

Countless millions of people have died from tuberculosis, a chronic infectious disease caused by the tubercle bacillus. The complete genome sequence of the best-characterized strain of Mycobacterium tuberculosis, H37Rv, has been determined and analysed in order to improve our understanding of the biology of this slow-growing pathogen and to help the conception of new prophylactic and therapeutic interventions. The genome comprises 4,411,529 base pairs, contains around 4,000 genes, and has a very high guanine + cytosine content that is reflected in the biased amino-acid content of the proteins. M. tuberculosis differs radically from other bacteria in that a very large portion of its coding capacity is devoted to the production of enzymes involved in lipogenesis and lipolysis, and to two new families of glycine-rich proteins with a repetitive structure that may represent a source of antigenic variation.