Mathematical models and computational methods for the analysis of genome-scale protein synthesis

Proteins are a ubiquitous and indispensable element for every living organism, from simple bacteria to mammals. Already in the simplest organisms, there exist some thousands of different protein species that take up a great variety of structures, and thus different roles, letting them precisely orchestrate the functioning of each cell. Despite this diversity of functions and shapes, all proteins are emerging from a same root: the DNA that encodes all proteins, in a same way as a dictionary contains the definition of each word. When a cell needs a specific protein, it will therefore "read" this "DNA dictionary" and translate it into another "language": from a nucleotide sequence of the DNA to an amino acids sequence, which is the basis of each protein. This process of "reading" the DNA to form a protein, or in better terms the protein synthesis, lies at the heart of every organism. Indeed 80% of the cellular energy is devoted to protein synthesis. The main mechanisms of this process are the same for all proteins and for all kingdoms of life. A good understanding of this process is therefore essential to biology; any malfunctioning could potentially lead to diseases and, on the other hand, any of the steps of protein synthesis could be a prospective drug target. This is already the case of various antibiotics. In addition to that, a good understanding of this system is also valuable in recombinant vaccine and recombinant drug production, in order to help improve the yield of these proteins. Recombinant proteins technology is used for example to synthesize the hepatitis B vaccine in yeast cells or to synthesize the recombinant human insulin in Escherichia coli cells. This is done by inserting into the organism a DNA plasmid that encodes the given protein so that the transfected organism will then synthesize this protein nearly as if it was from its own DNA. A further benefit from an in -depth knowledge of protein synthesis relates to circuit design in synthetic biology. There, the goal is to design cells that will respond to their environment in a predefined manner, and again, this is done by inserting specific genes into the cells. Understanding protein synthesis can help to estimate the sensibility of such a system as well as help to define characteristics of its response. Nowadays, the many facets of protein synthesis and its regulation are getting increasingly better understood. Nevertheless, it also becomes increasingly more evident that the classical approach of studying every component in isolation should be left aside and the system or cell should be studied as a whole, due to the interconnections of all of its elements: we have entered in the systems biology era. With the recent advances in genomics, transcriptomics, proteomics and other –omics technologies, we are able to measure the state of cells under different conditions in a high -throughput manner, enabling some global, genome -scale view as aimed at by systems biology. The huge amount of data collected by these high -throughput techniques poses a new challenge: how can we efficiently integrate these data to make some sense out of them for gaining deeper understanding and for the design and optimization of novel systems. A general answer is that computational approaches are needed. A model can be built to represent the system, and its outputs can then be compared to the experimental measurements. The great advantage of the modeling and simulation approach is that we can build many different in silico systems to test and to compare which one best represents our current knowledge. This in silico system can then be subjected to different "virtual" conditions, with the goal of observing how the system would behave in response to these conditions, which can be repeated for many cases and conditions in a very cost - and time -effective way in comparison to an in vitro or in vivo experiment. In this thesis, we aim at integrating such high -throughput data into a model for a better understanding of protein synthesis. We mainly focus on the often -neglected steps of translation, to observe their possible influence and regulation on the system. For this, a model incorporating all the steps of translation is built, including the various intermediate translation elongation steps. We then develop a novel, efficient, exact stochastic algorithm, targeted here to simulate translation at the genome‐scale, accounting for the competition between mRNAs for shared resources. This algorithm could easily be adapted to other systems than translation as well. Another novelty is further introduced with a methodology to analyze and estimate polysome sizes from experimental measurements in prokaryotes. Integrating various experimental measurements into our model of translation, we additionally estimate translation characteristics at the genome-scale for prokaryotic and eukaryotic cells, and we observe how the system has been optimized to cope with the cellular needs. We further estimate the sensitivity of protein synthesis on different perturbations like changes in initiation, elongation, termination rates, changes in ribosome availabilities or mRNA copy numbers, or changes following starvation conditions. Taken together, the results from this thesis show that the regulation at the translation steps is stronger than is commonly assumed and it can have many implications on the system.


Related material