Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Large-scale variational inference for Bayesian joint regression modelling of high-dimensional genetic data
 
doctoral thesis

Large-scale variational inference for Bayesian joint regression modelling of high-dimensional genetic data

Ruffieux, Hélène  
2019

Genetic association studies have become increasingly important in understanding the molecular bases of complex human traits. The specific analysis of intermediate molecular traits, via quantitative trait locus (QTL) studies, has recently received much attention, prompted by the advance of high-throughput technologies for quantifying gene, protein and metabolite levels. Of great interest is the detection of weak trans-regulatory effects between a genetic variant and a distal gene product. In particular, hotspot genetic variants, which remotely control the levels of many molecular outcomes, may initiate decisive functional mechanisms underlying disease endpoints.

This thesis proposes a Bayesian hierarchical approach for joint analysis of QTL data on a genome-wide scale. We consider a series of parallel sparse regressions combined in a hierarchical manner to flexibly accommodate high-dimensional responses (molecular levels) and predictors (genetic variants), and we present new methods for large-scale inference.

Existing approaches have limitations. Conventional marginal screening does not account for local dependencies and association patterns common to multiple outcomes and genetic variants, whereas joint modelling approaches are restricted to relatively small datasets by computational constraints. Our novel framework allows information-sharing across outcomes and variants, thereby enhancing the detection of weak trans and hotspot effects, and implements tailored variational inference procedures that allow simultaneous analysis of data for an entire QTL study, comprising hundreds of thousands of predictors, and thousands of responses and samples.

The present work also describes extensions to leverage spatial and functional information on the genetic variants, for example, using predictor-level covariates such as epigenomic marks. Moreover, we augment variational inference with simulated annealing and parallel expectation-maximisation schemes in order to enhance exploration of highly multimodal spaces and allow efficient empirical Bayes estimation.

Our methods, publicly available as packages implemented in R and C++, are extensively assessed in realistic simulations. Their advantages are illustrated in several QTL applications, including a large-scale proteomic QTL study on two clinical cohorts that highlights novel candidate biomarkers for metabolic disorders.

  • Files
  • Details
  • Metrics
Type
doctoral thesis
DOI
10.5075/epfl-thesis-9139
Author(s)
Ruffieux, Hélène  
Advisors
Davison, Anthony C.  
•
Hager, Jörg  
Jury

Prof. Kathryn Hess Bellwald (présidente) ; Prof. Anthony C. Davison, Dr Jörg Hager (directeurs) ; Prof. Stephan Morgenthaler, Prof. Chris C. Holmes, Prof. Sylvia Richardson (rapporteurs)

Date Issued

2019

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2019-04-05

Thesis number

9139

Total of pages

208

Subjects

Bayesian sparse regression

•

Hierarchical model

•

High-dimensional data

•

Molecular quantitative trait locus analysis

•

Pleiotropy

•

Statistical genetics

•

Variable selection

•

Variational inference.

EPFL units
STAT  
Faculty
SB  
School
MATHAA  
Doctoral School
EDMA  
Available on Infoscience
March 27, 2019
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/155744
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés