Lempel‐Ziv compression of highly structured documents
Joaquín Adiego,
Gonzalo Navarro and
Pablo de la Fuente
Journal of the American Society for Information Science and Technology, 2007, vol. 58, issue 4, 461-478
Abstract:
The authors describe Lempel‐Ziv to Compress Structure (LZCS), a novel Lempel–Ziv approach suitable for compressing structured documents. LZCS takes advantage of repeated substructures that may appear in the documents, by replacing them with a backward reference to their previous occurrence. The result of the LZCS transformation is still a valid structured document, which is human‐readable and can be transmitted by ASCII channels. Moreover, LZCS transformed documents are easy to search, display, access at random, and navigate. In a second stage, the transformed documents can be further compressed using any semistatic technique, so that it is still possible to do all those operations efficiently; or with any adaptive technique to boost compression. LZCS is especially efficient in the compression of collections of highly structured data, such as extensible markup language (XML) forms, invoices, e‐commerce, and Web‐service exchange documents. The comparison with other structure‐aware and standard compressors shows that LZCS is a competitive choice for these type of documents, whereas the others are not well‐suited to support navigation or random access. When joined to an adaptive compressor, LZCS obtains by far the best compression ratios.
Date: 2007
References: Add references at CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1002/asi.20496
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:58:y:2007:i:4:p:461-478
Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890
Access Statistics for this article
More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().