Noisy Text Categorization

This work presents a system for the categorization of noisy texts. By noisy it is meant any text obtained through an extraction process (affected by errors) from media different than digital texts. We show that, even with an average Word Error Rate of around 50%, the categorization performance loss with respect to the clean version of the same documents is negligible.


Year:
2003
Publisher:
IDIAP
Keywords:
Laboratories:




 Record created 2006-03-10, last modified 2018-01-27

External links:
Download fulltextURL
Download fulltextn/a
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)