Abstract

Genome-wide association studies (GWASs) have demonstrated that most common diseases have a strong genetic component from many genetic variants each with a small effect size. GWAS summary statistics have allowed the construction of polygenic scores (PGSs) estimating part of the individual risk for common diseases. Here, we propose to improve PGS-based risk estimation by incorporating genetic ancestry derived from genome-wide genotyping data. Our method involves three cohorts: a base (or discovery) for association studies, a target for phenotype/risk prediction, and a map for ancestry mapping; successively, (1) it generates for each individual in the base and target cohorts a set of principal components based on the map cohort-called mapped PCs, (2) it associates in the base cohort the phenotype with the mapped-PCs, and (3) it uses the mapped PCs in the target cohort to generate a phenotypic predictor called the ancestry score. We evaluated the ancestry score by comparing a predictive model using a PGS with one combining a PGS and an ancestry score. First, we performed simulations and found that the ancestry score has a greater impact on traits that correlate with ancestry-specific variants. Second, we showed, using UK Biobank data, that the ancestry score improves genetic prediction for our nine phenotypes to very different degrees. Third, we performed simulations and found that the more heterogeneous the base and target cohorts, the more beneficial the ancestry score is. Finally, we validated our approach under realistic conditions with UK Biobank as the base cohort and Swiss individuals from the CoLaus vertical bar PsyCoLaus study as the target cohort.

Details