Compression of continuous prose texts using variety generation
David Cooper,
Michael A. Emly,
Michael F. Lynch and
A. Robin Yeates
Journal of the American Society for Information Science, 1980, vol. 31, issue 3, 201-207
Abstract:
The use of variety‐generation techniques for text compression depends on the selection of symbol sets, or sets of variable‐length character strings occurring approximately equifrequently in the text in question. In order that the method perform efficiently in a variety of situations, the symbol set must be reasonably independent of the particular text used in its generation. Hence, texts of different origins must be similar in their microstructure for the technique to work well. Texts of American English varying in subject and style have been found to fulfill this condition. On average the texts can be represented with a space saving of just over 50% on the space used by a fixed‐length 8‐bit representation of the characters, and the best results are obtained using a symbol set generated from a sample of the complete data base, although results from subsets of the data base are almost as good.
Date: 1980
References: Add references at CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1002/asi.4630310312
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jamest:v:31:y:1980:i:3:p:201-207
Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1097-4571
Access Statistics for this article
More articles in Journal of the American Society for Information Science from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().