Data binarization by discriminant elimination
This paper is concerned with the problem of constructing a mapping from an arbitrary input space $\Input$ into a binary output space $\Bin^\BinDim$, based on a given data set $\DataSet \subset \Input$ partitioned into classes. The aim is to reduce the total amount of information, while keeping the most relevant of it for the partitioning. An additional constraint to our problem is that the mapping must have a simple interpretation. Thus, each of the $\BinDim$ discriminants is related to one original attribute (e.g. linear combinations of original attributes are not admitted). Beyond data compression, the targeted application is a preprocessing for classification techniques that require Boolean input data. While other existing techniques for this problem are constructive (increasing $\BinDim$ iteratively, such as decision trees), the method proposed here proceeds by starting with a very large dimension $\BinDim$, and by reducing it iteratively.
rr99-04.pdf
openaccess
351.89 KB
Adobe PDF
ab93eeab5a91bc32f95b07dd5005a8dc