Privacy-Preserving Computation of Disease Risk by Using Genomic, Clinical, and Environmental Data
According to many scientists and clinicians, genomics is the "next big thing" in the field of medicine. On one hand, decreasing costs in genome sequencing has been paving the way to better preventive and personalized medicine. On the other hand, genomic data also raises serious privacy concerns, as it is the ultimate identifier of an individual and it contains privacy-sensitive data (e.g., disease predispositions, ancestry information). Thus, it is necessary to find ways of using genomic data without abusing the genomic privacy of individuals. To get a more comprehensive medical assessment, genomic information must be combined with other clinical and environmental data (such as demographic information, family history, disease history, laboratory test results, etc.) that are also privacy-sensitive (e.g., HIV status of an individual) and need to be treated as such. Focusing on disease risk tests, in this paper, we propose a privacy-preserving system for storing and processing genomic, clinical, and environmental data by using homomorphic encryption and privacy-preserving integer comparison. We implement the proposed system using real patient data and reliable disease risk factors. In particular, we use 23 genetic and 14 clinical and environmental risk factors to compute the risk of coronary artery disease in a privacy-preserving way. Finally, we show the practicality of the proposed system via a complexity evaluation.