Computational studies in epigenomics using histone modification data

Epigenetic factors like histone modifications are known to play an important role in gene regulation and cell differentiation. Recently, thanks to advances in technologies like ChIP-Seq which is a high-throughput, high resolution, and low cost technology for studying histone modifications and transcription factors, we have large amounts of data available. Therefore computational techniques become important for studying and interpreting this data. In this thesis, we have focused on primarily building computational methods to analyze and study ChIP-Seq histone modification data. The work can be divided into two broad topics : (a) to process ChIP-Seq data computationally and to identify regions of biological interest ; (b) to use processed data for higher-level analysis to study problems in cell differentiation and evolution of cell types, based on phylogenetic approaches. In the first topic, this thesis makes a contribution by addressing two problems : (i) We propose a two-stage statistical method, called ChIPnorm, to normalize ChIP-Seq data, and to find differential regions in the genome, given two libraries of histone modifications of different cell types. We show that our method removes most of the bias in the data and also provides a normalization that enables direct comparison of values between the two cell types. We show that our method outperforms the state of the art techniques in literature. (ii) We propose probabilistic partitioning methods to discover significant patterns in ChIP-Seq data. Our methods work on the principle of expectation-maximization, is simple and flexible, and takes into account signal magnitude, shape, strand orientation, and shifts. It runs in linear time and gives improved results on the state of the art techniques especially when used on sparse data. In the second topic, we try to provide a link between the fields of epigenomics and evolution. We introduce the concept of cell-type trees based on the principles of phylogenetic inference on ChIP-Seq histone modification data. These cell-type trees are precisely defined and algorithmic techniques are designed to infer these trees from the data. In the process, we develop new data representation techniques and also a peak-finder to help us build good cell-type trees. We obtain biologically meaningful results and show that cell-type trees have the potential to study cell differentiation and the evolution of cell types across species.

Related material