A Pattern-Mining Method for High-Throughput Lab-on-a-Chip Data Analysis
Biochips are emerging as a useful tool for high-throughput acquisition of biological data and continue to grow in information quality and in discovering new applications. Recent advances include CMOS-based integrated biosensor arrays for deoxyribonucleic acid (DNA) expression analysis (Hassibi and Lee, 2005), (Schienle , 2004), and active research is ongoing for the miniaturization and integration of protein microarrays (Kiyonaka , 2004), (Rubina , 2003), (Scrivener , 2003), tissue microarrays (TMAs), (Chen , 2004), (Shergill , 2004), and fluorescence-based multiplexed cytokine immunoassays (Wang , 2002). The main advantages of microfluidic lab-on-a-chip include ease of use, speed of analysis, low sample and reagent consumption, and high reproducibility due to standardization and automation. Without effective data-analysis methods, however, the merit of acquiring massive data through biochips will be marginal. The high-dimensional nature of such data requires novel techniques that can cope with the curse of dimensionality better than conventional data-analysis approaches. In this paper, the authors proposed a pattern-mining method to analyze large-scale biological data obtained from high-throughput biochip experiments. In particular, when a data set is given as a matrix, the method can find patterns appearing in the form of (possibly overlapping) submatrices of the input matrix. The method exploits the techniques developed for the symbolic manipulation of Boolean functions. Leveraged by this approach, the method can find, given a data matrix, all patterns that satisfy specific input parameters. The authors tested the method with several large-scale biochip data and observed that the proposed method outperforms the alternatives in terms of efficiency and the number of patterns discovered.