EconPapers    
Economics at your fingertips  
 

Lempel‐Ziv compression of highly structured documents

Joaquín Adiego, Gonzalo Navarro and Pablo de la Fuente

Journal of the American Society for Information Science and Technology, 2007, vol. 58, issue 4, 461-478

Abstract: The authors describe Lempel‐Ziv to Compress Structure (LZCS), a novel Lempel–Ziv approach suitable for compressing structured documents. LZCS takes advantage of repeated substructures that may appear in the documents, by replacing them with a backward reference to their previous occurrence. The result of the LZCS transformation is still a valid structured document, which is human‐readable and can be transmitted by ASCII channels. Moreover, LZCS transformed documents are easy to search, display, access at random, and navigate. In a second stage, the transformed documents can be further compressed using any semistatic technique, so that it is still possible to do all those operations efficiently; or with any adaptive technique to boost compression. LZCS is especially efficient in the compression of collections of highly structured data, such as extensible markup language (XML) forms, invoices, e‐commerce, and Web‐service exchange documents. The comparison with other structure‐aware and standard compressors shows that LZCS is a competitive choice for these type of documents, whereas the others are not well‐suited to support navigation or random access. When joined to an adaptive compressor, LZCS obtains by far the best compression ratios.

Date: 2007
References: Add references at CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1002/asi.20496

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:58:y:2007:i:4:p:461-478

Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890

Access Statistics for this article

More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-19
Handle: RePEc:bla:jamist:v:58:y:2007:i:4:p:461-478