Meta-analysis of Incomplete Microarray Studies

Leboucq, Alix

doi:10.5075/epfl-thesis-6371

doctoral thesis

Meta-analysis of Incomplete Microarray Studies

2014

Meta-analysis of microarray studies to produce an overall gene list is relatively straightforward when complete data are available. When some studies lack information, providing only a ranked list of genes, for example, it is common to reduce all studies to ranked lists prior to combining them. Since this entails a loss of information, we consider a hierarchical Bayes approach to meta-analysis using different types of information from different studies: the full data matrix, summary statistics or ranks. The model uses an informative prior for the parameter of interest to aid the detection of differentially expressed genes. Simulations show that the new approach can give substantial power gains compared to classical meta analysis and list aggregation methods. A meta-analysis of 11 published ovarian cancer studies with different data types identifies genes known to be involved in ovarian cancer, shows significant enrichment, while controlling the number of false positives. Independence of genes is a common assumption in microarray data analysis, and in the previous model, although it is not true in practice. Indeed, genes are activated in groups called modules: sets of co-regulated genes. These modules are usually defined by biologists, based on the position of the genes on the chromosome or known biological pathways (KEGG, GO for example). Our goal in the second part of this work is to be able to define modules common to several studies, in an automatic way. We use an empirical Bayes approach to estimate a sparse correlation matrix common to all studies, and identify modules by clustering. Simulations show that our approach performs as well or better than existing methods in terms of detection of modules across several datasets. We also develop a method based on extreme value theory to detect scattered genes, which do not belong to any module. This automatic module detection is very fast and produces accurate modules in our simulation studies. Application to real data results in a huge dimension reduction, which allows us to fit the hierarchical Bayesian model to modules, without the computational burden. Differentially expressed modules identified by this analysis present significant enrichment, indicating promising results of the method for future applications.

Name

EPFL_TH6371.pdf

Access type

openaccess

Size

28.46 MB

Format

Adobe PDF

Checksum (MD5)

3df62a9fa4d526724ee3e4230975766f