Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Meta-analysis of Incomplete Microarray Studies
 
doctoral thesis

Meta-analysis of Incomplete Microarray Studies

Leboucq, Alix  
2014

Meta-analysis of microarray studies to produce an overall gene list is relatively straightforward when complete data are available. When some studies lack information, providing only a ranked list of genes, for example, it is common to reduce all studies to ranked lists prior to combining them. Since this entails a loss of information, we consider a hierarchical Bayes approach to meta-analysis using different types of information from different studies: the full data matrix, summary statistics or ranks. The model uses an informative prior for the parameter of interest to aid the detection of differentially expressed genes. Simulations show that the new approach can give substantial power gains compared to classical meta analysis and list aggregation methods. A meta-analysis of 11 published ovarian cancer studies with different data types identifies genes known to be involved in ovarian cancer, shows significant enrichment, while controlling the number of false positives. Independence of genes is a common assumption in microarray data analysis, and in the previous model, although it is not true in practice. Indeed, genes are activated in groups called modules: sets of co-regulated genes. These modules are usually defined by biologists, based on the position of the genes on the chromosome or known biological pathways (KEGG, GO for example). Our goal in the second part of this work is to be able to define modules common to several studies, in an automatic way. We use an empirical Bayes approach to estimate a sparse correlation matrix common to all studies, and identify modules by clustering. Simulations show that our approach performs as well or better than existing methods in terms of detection of modules across several datasets. We also develop a method based on extreme value theory to detect scattered genes, which do not belong to any module. This automatic module detection is very fast and produces accurate modules in our simulation studies. Application to real data results in a huge dimension reduction, which allows us to fit the hierarchical Bayesian model to modules, without the computational burden. Differentially expressed modules identified by this analysis present significant enrichment, indicating promising results of the method for future applications.

  • Files
  • Details
  • Metrics
Type
doctoral thesis
DOI
10.5075/epfl-thesis-6371
Author(s)
Leboucq, Alix  
Advisors
Davison, Anthony C.  
•
Goldstein, Darlène  
Jury

Prof. T. Mountford (président) ; Prof. A.C. Davison, Dr D. Goldstein (directeurs) ; Dr M. Delorenzi, Prof. S. Morgenthaler, Dr L. Wernisch (rapporteurs)

Date Issued

2014

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2014-10-17

Thesis number

6371

Subjects

clustering

•

empirical Bayes estimation

•

hierarchical Bayesian model

•

high-dimensional data

•

large covariance matrix estimation

•

MCMC

•

meta-analysis

•

microarray gene expression data

•

modules

EPFL units
STAT  
Faculty
SB  
School
MATHAA  
Doctoral School
EDMA  
Available on Infoscience
October 20, 2014
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/107493
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés