Large-scale variational inference for Bayesian joint regression modelling of high-dimensional genetic data

Ruffieux, Hélène

doi:10.5075/epfl-thesis-9139

Ruffieux, Hélène

2019

Download

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Genetic association studies have become increasingly important in understanding the molecular bases of complex human traits. The specific analysis of intermediate molecular traits, via quantitative trait locus (QTL) studies, has recently received much attention, prompted by the advance of high-throughput technologies for quantifying gene, protein and metabolite levels. Of great interest is the detection of weak trans-regulatory effects between a genetic variant and a distal gene product. In particular, hotspot genetic variants, which remotely control the levels of many molecular outcomes, may initiate decisive functional mechanisms underlying disease endpoints. This thesis proposes a Bayesian hierarchical approach for joint analysis of QTL data on a genome-wide scale. We consider a series of parallel sparse regressions combined in a hierarchical manner to flexibly accommodate high-dimensional responses (molecular levels) and predictors (genetic variants), and we present new methods for large-scale inference. Existing approaches have limitations. Conventional marginal screening does not account for local dependencies and association patterns common to multiple outcomes and genetic variants, whereas joint modelling approaches are restricted to relatively small datasets by computational constraints. Our novel framework allows information-sharing across outcomes and variants, thereby enhancing the detection of weak trans and hotspot effects, and implements tailored variational inference procedures that allow simultaneous analysis of data for an entire QTL study, comprising hundreds of thousands of predictors, and thousands of responses and samples. The present work also describes extensions to leverage spatial and functional information on the genetic variants, for example, using predictor-level covariates such as epigenomic marks. Moreover, we augment variational inference with simulated annealing and parallel expectation-maximisation schemes in order to enhance exploration of highly multimodal spaces and allow efficient empirical Bayes estimation. Our methods, publicly available as packages implemented in R and C++, are extensively assessed in realistic simulations. Their advantages are illustrated in several QTL applications, including a large-scale proteomic QTL study on two clinical cohorts that highlights novel candidate biomarkers for metabolic disorders.

Details

Title Large-scale variational inference for Bayesian joint regression modelling of high-dimensional genetic data

Author(s) Ruffieux, Hélène

Advisor(s)

Davison, Anthony C.
Hager, Jörg

Pagination 208

Date 2019

Publisher Lausanne, EPFL

Keywords

Bayesian sparse regression; Hierarchical model; High-dimensional data; Molecular quantitative trait locus analysis; Pleiotropy; Statistical genetics; Variable selection; Variational inference.

Language English

DOI https://doi.org/10.5075/epfl-thesis-9139

Laboratories STAT

Record Appears in Scientific production and competences > SB - School of Basic Sciences > MATH - Institute of Mathematics > STAT - Chair of Statistics
Scientific production and competences > SB - School of Basic Sciences > Mathematics
Scientific production and competences > EPFL Theses
Work produced at EPFL
Published
Theses

Record creation date 2019-03-27

Actions

Preview

Select file: