Gene Expression Data Analysis Using a Novel Approach to Biclustering Combining Discrete and Continuous Data
Many different methods exist for pattern detection in gene expression data. In contrast to classical methods, biclustering has the ability to cluster a group of genes together with a group of conditions (replicates, set of patients, or drug compounds). However, since the problem is NP-complex, most algorithms use heuristic search functions and, therefore, might converge toward local maxima. By using the results of biclustering on discrete data as a starting point for a local search function on continuous data, our algorithm avoids the problem of heuristic initialization. Similar to Order-Preserving Submatrices (OPSM), our algorithm aims to detect biclusters whose rows and columns can be ordered such that row values are growing across the bicluster's columns and vice versa. Results have been generated on the yeast genome (Saccharomyces cerevisiae), a human cancer data set, and random data. Results on the yeast genome showed that 89 percent of the 100 biggest nonoverlapping biclusters were enriched with Gene Ontology annotations. A comparison with the methods OPSM and Iterative Signature Algorithm (ISA, a generalization of singular value decomposition) demonstrated a better efficiency when using gene and condition orders. We present results on random and real data sets that show the ability of our algorithm to capture statistically significant and biologically relevant biclusters.