Student project

Using a random walker on gene expression and protein-protein interaction networks to prioritize candidate genes

Identification of genes underlying human diseases is an important step in understanding and treating genetic disorders. Based on the assumption that related diseases are caused by related genes, several methods for candidate gene prioritization have been proposed in the past to refine lists of suspect genes obtained by linkage analysis or other methods. The large increase in publicly available -omics data has made it possible to implement prioritization methods that combine information from multiple data sources to make better rankings. In this work, we present a new method for prioritization of candidate disease genes based on gene expression data, that ranks 12851 genes for 5080 phenotypes. The performance is comparable to previous methods which used hand-curated protein-protein data on smaller test sets. We also propose a method for combining multiple gene networks into a single one with which we ranked up to 14612 genes for 5080 phenotypes, more than any previous method. Our evaluation shows, that the performance of the fused network is superior to that of its separate component networks.

Related material