Computational studies in epigenomics using histone modification data
Epigenetic factors like histone modifications are known to play an important role in gene
regulation and cell differentiation. Recently, thanks to advances in technologies like ChIP-Seq
which is a high-throughput, high resolution, and low cost technology for studying histone
modifications and transcription factors, we have large amounts of data available. Therefore
computational techniques become important for studying and interpreting this data.
In this thesis, we have focused on primarily building computational methods to analyze and
study ChIP-Seq histone modification data. The work can be divided into two broad topics : (a)
to process ChIP-Seq data computationally and to identify regions of biological interest ; (b)
to use processed data for higher-level analysis to study problems in cell differentiation and
evolution of cell types, based on phylogenetic approaches.
In the first topic, this thesis makes a contribution by addressing two problems : (i) We propose
a two-stage statistical method, called ChIPnorm, to normalize ChIP-Seq data, and to find
differential regions in the genome, given two libraries of histone modifications of different
cell types. We show that our method removes most of the bias in the data and also provides a
normalization that enables direct comparison of values between the two cell types. We show
that our method outperforms the state of the art techniques in literature. (ii) We propose
probabilistic partitioning methods to discover significant patterns in ChIP-Seq data. Our
methods work on the principle of expectation-maximization, is simple and flexible, and takes
into account signal magnitude, shape, strand orientation, and shifts. It runs in linear time and
gives improved results on the state of the art techniques especially when used on sparse data.
In the second topic, we try to provide a link between the fields of epigenomics and evolution.
We introduce the concept of cell-type trees based on the principles of phylogenetic inference on
ChIP-Seq histone modification data. These cell-type trees are precisely defined and algorithmic
techniques are designed to infer these trees from the data. In the process, we develop new
data representation techniques and also a peak-finder to help us build good cell-type trees.
We obtain biologically meaningful results and show that cell-type trees have the potential to
study cell differentiation and the evolution of cell types across species.
EPFL_TH6327.pdf
openaccess
6.29 MB
Adobe PDF
36d08f1c3d7545f190ff0ae872df2772