Data binarization by discriminant elimination

Moreira, Miguel; Hertz, Alain; Mayoraz, Eddy

conference paper

Moreira, Miguel

•

Hertz, Alain

•

Mayoraz, Eddy

Bruha, Ivan

•

Bohanec, Marco

1999

Proceedings of the ICML-99 Workshop: From Machine Learning to Knowledge Discovery in Databases

This paper is concerned with the problem of constructing a mapping from an arbitrary input space $\Input$ into a binary output space $\Bin^\BinDim$, based on a given data set $\DataSet \subset \Input$ partitioned into classes. The aim is to reduce the total amount of information, while keeping the most relevant of it for the partitioning. An additional constraint to our problem is that the mapping must have a simple interpretation. Thus, each of the $\BinDim$ discriminants is related to one original attribute (e.g. linear combinations of original attributes are not admitted). Beyond data compression, the targeted application is a preprocessing for classification techniques that require Boolean input data. While other existing techniques for this problem are constructive (increasing $\BinDim$ iteratively, such as decision trees), the method proposed here proceeds by starting with a very large dimension $\BinDim$, and by reducing it iteratively.

Name

rr99-04.pdf

Access type

openaccess

Size

351.89 KB

Format

Adobe PDF

Checksum (MD5)

ab93eeab5a91bc32f95b07dd5005a8dc