Journal article

Quantifying Interdependent Risks in Genomic Privacy

The rapid progress in human-genome sequencing is leading to a high availability of genomic data. These data is notoriously very sensitive and stable in time, and highly correlated among relatives. In this article, we study the implications of these familial correlations on kin genomic privacy. We formalize the problem and detail efficient reconstruction attacks based on graphical models and belief propagation. With our approach, an attacker can infer the genomes of the relatives of an individual whose genome or phenotype are observed by notably relying on Mendel's Laws, statistical relationships between the genomic variants, and between the genome and the phenotype. We evaluate the effect of these dependencies on privacy with respect to the amount of observed variants and the relatives sharing them. We also study how the algorithmic performance evolves when we take these various relationships into account. Furthermore, to quantify the level of genomic privacy as a result of the proposed inference attack, we discuss possible definitions of genomic privacy metrics, and compare their values and evolution. Genomic data reveals Mendelian disorders and the likelihood of developing severe diseases, such as Alzheimer's. We also introduce the quantification of health privacy, specifically, the measure of how well the predisposition to a disease is concealed from an attacker. We evaluate our approach on actual genomic data from a pedigree and show the threat extent by combining data gathered from a genome-sharing website as well as an online social network.


Related material