Mining gene sets for measuring similarities
In recent years, the development of high throughput devices for the massive parallel analyses of genomic data has lead to the generation of large amount of new biological evidences and has triggered the proliferation of data mining algorithms for the extraction of meaningful information. Microarrays for gene expression analyses are part of this revolution and provide important insight in molecular biology often in the form of coherent sets of genes representing previously uncharacterized processes. Large amount of data are continuously produced in this form, and computational approaches can significantly improve the efficient use of these results, since comparison among numbers of genes sets can give new meaningful information at no cost from the experimental biology point of view. To address this opportunity we designed and implemented FIT, a scalable, unsupervised algorithm that quantitatively compares different populations of gene sets using two distinct measures of similarity between any two gene sets. These measures are then used to obtain a summary statistic that describes the tightness of fit between sets belonging to two distinct populations of gene sets. We present the results of FIT on two data sets for the study of Lymphoma and Acute Lymphoblastic Leukemia. In both cases FIT was able to recapitulate the previous analyses on these datasets, to extend the results and to extract information likely to offer potential insights into the underlying biology.
Record created on 2006-04-06, modified on 2016-08-08