The genetic architecture of complex human traits at the dawn of genomic medicine
The focus of the work presented in this thesis is the exploration of the genetic architecture of complex human traits - at the dawn of genomic medicine.
The underlying mechanisms explaining the enormously polygenic nature of most human complex traits are still unknown. The first chapter explores a possible explanatory model in which variant effects are due to an indirect mechanism, namely competition among genes for shared intracellular resources such as ribosomes. Our findings show that under most reasonable assumptions, resource competition should not be expected to have much impact on either protein expression levels of individual genes or on complex trait outcomes.
The prediction accuracy of polygenic scores (PGS) remains relatively modest compared to what is expected given the estimated heritability of traits. Traditionally, the construction of PGS uses a large number of genetic variations, most of which have weak additive effects. Recent machine learning methods could improve PGS by also aggregating epistatic effects. To evaluate these different methods, we conducted an experiment based on an innovative concept of crowdsourcing, detailed in the second chapter. We collaborated with opensnp.org, an open repository where people share their genotyping data and phenotypic information, and with crowdai.org, a platform that allowed us to create a public competition for the genomic prediction of height. The challenge lasted three months and attracted 138 participants. This was the first crowd-sourcing challenge based on publicly available genome-wide genotyping data.
Due to the enormous number of potential combinations of variants, it is difficult to integrate epistatic effects into PGS. In the third chapter, we present a method where we limit the possible combinations to the boundaries of each topologically associated domain (TAD) independently. With the UK Biobank, for the height phenotype, we included 17,560 variants in an artificial neural network (ANN) and compared the variance explained ($R^2$) by the PGS with or without the knowledge of the TADs. We found that it brings a significant improvement with an average $R^2$ going from 0.287 to 0.293 (with a p-value $=10E-5$ for n=20). We concluded that it should be possible to build better PGS using ANNs and epistasis in TADs.
The effect of genetic ancestry on phenotypes is not taken into account in PGS-based risk estimates. Doing so could accelerate the adoption of genomic medicine for underrepresented populations and mixed-race individuals. The fourth chapter presents a method for its integration through a secondary score derived from genome-wide genotyping data, the PC score (PCS). We compared two models, one using only the PGS and the other using both the PGS and the PCS. Using the UK Biobank, we found an improvement in genetic prediction for all phenotypes tested: <10% for blood pressure, BMI and baldness, 16% for menarche age, 38% for height, 71% for menopausal age, 138% for bone mineral density, 350% for education and 2800% for skin color. These results were reproduced when the trained models were applied to an external cohort (Cohort Lausannoise).
Each advance in the understanding of complex traits and the calculation of PGS has the potential to improve genomic medicine when used routinely in clinical practice. During these four years, I have had the opportunity to act at different levels to participate in this long-awaited evolution.
EPFL_TH8464.pdf
n/a
openaccess
Copyright
41.84 MB
Adobe PDF
1295ed475a0bac19237d1350b2037e2d