ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences

One of the problems associated with the large-scale analysis of unannotated, low quality EST sequences is the detection of coding regions and the correction of frameshift errors that they often contain. We introduce a new type of hidden Markov model that explicitly deals with the possibility of errors in the sequence to analyze, and incorporates a method for correcting these errors. This model was implemented in an efficient and robust program, ESTScan. We show that ESTScan can detect and extract coding regions from low-quality sequences with high selectivity and sensitivity, and is able to accurately correct frameshift errors. In the framework of genome sequencing projects, ESTScan could become a very useful tool for gene discovery, for quality control, and for the assembly of contigs representing the coding regions of genes.

Published in:
Proc Int Conf Intell Syst Mol Biol, 138-148
ISMB 1999 best paper award

 Record created 2007-12-17, last modified 2018-03-17

Rate this document:

Rate this document:
(Not yet reviewed)