Grammar-Based Tree Compression

Grammar-based compression means to find a small grammar that generates a given object. Such a grammar reveals the structure of the object (according to the grammar formalism used); the main advantage of this compression method is that the resulting grammar can often be used in further computations without prior decompression. A linear time bottom-up algorithm is presented which transforms a tree into a particular context-free tree grammar. For common XML documents the algorithm performs well, compressing the tree structure to about 5 per cent of the original size. The validation of an XML document against an XML type can be done without decompression, in linear time w.r.t. the size of the grammar (for a fixed type). While the involved grammars can be double exponentially smaller than the represented trees, testing them for equivalence can be done in polynomial space w.r.t. the sum of their sizes.


    • EPFL-REPORT-52615

    Record created on 2005-07-13, modified on 2016-08-08

Related material