Meta-analysis of Incomplete Microarray Studies

Leboucq, Alix

doi:10.5075/epfl-thesis-6371

Leboucq, Alix

2014

Download

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Meta-analysis of microarray studies to produce an overall gene list is relatively straightforward when complete data are available. When some studies lack information, providing only a ranked list of genes, for example, it is common to reduce all studies to ranked lists prior to combining them. Since this entails a loss of information, we consider a hierarchical Bayes approach to meta-analysis using different types of information from different studies: the full data matrix, summary statistics or ranks. The model uses an informative prior for the parameter of interest to aid the detection of differentially expressed genes. Simulations show that the new approach can give substantial power gains compared to classical meta analysis and list aggregation methods. A meta-analysis of 11 published ovarian cancer studies with different data types identifies genes known to be involved in ovarian cancer, shows significant enrichment, while controlling the number of false positives. Independence of genes is a common assumption in microarray data analysis, and in the previous model, although it is not true in practice. Indeed, genes are activated in groups called modules: sets of co-regulated genes. These modules are usually defined by biologists, based on the position of the genes on the chromosome or known biological pathways (KEGG, GO for example). Our goal in the second part of this work is to be able to define modules common to several studies, in an automatic way. We use an empirical Bayes approach to estimate a sparse correlation matrix common to all studies, and identify modules by clustering. Simulations show that our approach performs as well or better than existing methods in terms of detection of modules across several datasets. We also develop a method based on extreme value theory to detect scattered genes, which do not belong to any module. This automatic module detection is very fast and produces accurate modules in our simulation studies. Application to real data results in a huge dimension reduction, which allows us to fit the hierarchical Bayesian model to modules, without the computational burden. Differentially expressed modules identified by this analysis present significant enrichment, indicating promising results of the method for future applications.

Details

Title Meta-analysis of Incomplete Microarray Studies

Author(s) Leboucq, Alix

Advisor(s)

Davison, Anthony C.
Goldstein, Darlène

Date 2014

Publisher Lausanne, EPFL

Keywords

clustering; empirical Bayes estimation; hierarchical Bayesian model; high-dimensional data; large covariance matrix estimation; MCMC; meta-analysis; microarray gene expression data; modules

Language English

DOI https://doi.org/10.5075/epfl-thesis-6371

Other identifier(s) urn: urn:nbn:ch:bel-epfl-thesis6371-3

Laboratories STAT

Record Appears in Scientific production and competences > SB - School of Basic Sciences > MATH - Institute of Mathematics > STAT - Chair of Statistics
Scientific production and competences > SB - School of Basic Sciences > Mathematics
Scientific production and competences > EPFL Theses
Work produced at EPFL
Published
Theses

Record creation date 2014-10-20

Actions

Preview

Select file: