From Genes to Organisms : Bioinformatics System Models and Software

Schaffter, Thomas

doi:10.5075/epfl-thesis-6081

doctoral thesis

From Genes to Organisms : Bioinformatics System Models and Software

2014

The expression of genes is controlled by regulatory networks, which performspecific functions in a cell. Gene networks play a crucial role in the development of multicellular organisms by precisely coordinating spatial and temporal gene expression patterns during different developmental stages. Unravelling and modelling these networks is of key importance to gain eventually a complete understanding of developmental processes and genetically related diseases. In this thesis, we present a comprehensive framework for reverse engineering gene regulatory networks, which required the development of many methods in very diverse research fields. A second important contribution is their implementation as extensible, userfriendly and open source computational toolsa. Over the last decade, numerous methods have been developed for inference of regulatory networks fromgene expression data. However, relatively little effort has been put into evaluating the performance of those methods due to the difficulty of constructing adequate benchmarks and the lack of tools for a differentiated analysis of network predictions on such benchmarks. Here, we describe a novel and comprehensive method for in silico benchmark generation and performance profiling of network inference methods available to the community as an open-source software called GeneNetWeaver (GNW). In addition to the generation of detailed dynamical models of gene regulatory networks to be used as benchmarks, GNWprovides a networkmotif analysis that reveals systematic prediction errors, thereby indicating potential ways of improving inference methods. The accuracy of network inference methods is evaluated using standard metrics such as precision-recall and receiver operating characteristic (ROC) curves. Furthermore, we used GNW to provide the international DREAM (Dialogue for Reverse Engineering Assessments andMethods) competition with three network inference challenges (DREAM3, DREAM4, and DREAM5). In the context of the DREAM competition, 91 teams submitted about 900 network predictions to evaluate the performance of their methods on GNW-generated benchmarks. Today, the accuracy of more than 25,000 gene network reconstructions have been evaluated by GNWusers. Gene regulatory networks are often organized into groups, modules or community of related genes and proteins carrying out specific biological functions. Here, we also address the rational decomposition of (reconstructed) biological networks into function modules. We presentan extensible and modular framework for community structure detection in networks called Jmod. Jmod implements state-of-the-art community structure detection methods including Newman’s spectral algorithm and a genetic algorithm-basedmodularity optimization method that we developed. The performance of these methods has been evaluated on biological and in silico networks. The application of thesemethods is actually not limited to gene regulatory networks as they can also provide insight into the community structure of neural, social, and technological networks, for instance. However, modularity optimization methods are known to be affected by a resolution limit that makes them fail to detect small communities in large networks. Although several attempts have been proposed to overcome this limitation of modularity based methods, none of them solves it in a satisfactorily manner. Therefore, a community voting method was developed and implemented for combining multiple partitions obtained using our GA-based method into one partition more robust and reliable than the individual partitions. We have shown that this approach successfully overcome the resolution limit. Furthermore, our method is best performer along with another method in a comparative analysis that profiled the performance of twelve state-of-the-art community structure detection algorithms. The reconstruction of a developmental gene network in its spatial context remains a considerable challenge. One of the reason is that this process requires tremendous amount of spatial and temporal gene expression data, which are usually available in very limited quantities due to the inherent difficulty in measuring gene expression in an entire organism. Another contribution of this thesis is the development of an image processing application named WingJ for unsupervised and systematic quantification of the developing Drosophila wing, which is a classical model for studying the genetic control of tissue size, shape and patterning. First, a parametric model of the morphology or structure of the Drosophila wing is inferred from fluorescence images. The segmentation method is based on the design ofmultiple image processing detection modules, each focusing on the extraction of a specific feature of the wing structure including its orientation. The approach was later extended to the detection of the Drosophila embryo. The inferred structure model was then used as a convenient coordinate system for measuring gene and protein expression levels. An important feature of the obtained expression maps is that they can be used to compare domains of expression in differentiated systems, for example to visualize the difference in patterns of gene activity between wild type and mutant wings or in wings imaged at different time points during development. Moreover, a robust, multiscale quantitative description of the developing wing is obtained by combining morphological and gene expression information from multiple wings, completed by the output of an automatic cell nuclei detection method that we have developed. We have used the above method to automatically generate robust quantitative descriptions of wild-type and mutant (pent deficient) Drosophila wings imaged at 80, 90, 100, and 110 hours after egg laying. Furthermore, we have shown that these quantitative descriptions can be used to unravel the regulatory interactions of a six-gene wing developmental network.

Name

EPFL_TH6081.pdf

Type

n/a

Access type

openaccess

License Condition

Copyright

Size

33.76 MB

Format

Adobe PDF

Checksum (MD5)

a23fc8b3e0cf6205dc73e98db14c22f2