Best compression algorithm for XML?

Best compression algorithm for XML?

There is a W3 (not-yet-released) standard named EXI (Efficient XML Interchange).

Should become THE data format for compressing XML data in the future (claimed to be the last necessary binary format). Being optimized for XML, it compresses XML more ways more efficient than any conventional compression algorithm.

With EXI, you can operate on compressed XML data on the fly (without the need to uncompress or re-compress it).

EXI = (XML + XMLSchema) as binary.

And here you go with the opensource implementation (dont know if its already stable):
Exificient

Yes, *.zip best in practice. Gory deets contained in this USENIX paper showing that optimal compressors not worth computational cost & domain-specific compressors dont beat zip [on average].

Disclaimer: I wrote that paper, which has been cited 60+ times according to Google.

Best compression algorithm for XML?

Another alternative to compress XML would be FI (Fast Infoset).

XML, stored as FI, would contain every tag and attribute only once,
all other occurrences are referencing the first one,
thus saving space.

See:

Very good article on java.sun.com, and of course
the Wikipedia entry

The difference to EXI from the compression point of view is that Fast Infoset
(being structured plaintext) is less efficient.

Other important difference
is: FI is a mature standard with many implementations.
One of them: Fast Infoset Project @ dev.java.net

Leave a Reply

Your email address will not be published.