In situ generation of compressed inverted files

Moffat, Alistair; Bell, Timothy A. H.

In situ generation of compressed inverted files

Alistair Moffat and Timothy A. H. Bell

Journal of the American Society for Information Science, 1995, vol. 46, issue 7, 537-550

Abstract: An inverted index stores, for each term that appears in a collection of documents, a list of document numbers containing that term. Such an index is indispensable when Boolean or informal ranked queries are to be answered. Construction of the index is, however, a nontrivial task. Simple methods using in‐memory data structures cannot be used for large collections because they require too much random access storage, and traditional disk‐based methods require large amounts of temporary file space. This paper describes a new indexing algorithm designed to create large compressed inverted indexes in situ. It makes use of simple compression codes for the positive integers and an in‐place external multi‐way mergesort. The new technique has been used to invert a two‐gigabyte text collection in under 4 hours, using less than 40 megabytes of temporary disk space, and less than 20 megabytes of main memory. © 1995 John Wiley & Sons, Inc.

Date: 1995
References: Add references at CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1002/(SICI)1097-4571(199508)46:73.0.CO;2-P

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jamest:v:46:y:1995:i:7:p:537-550

Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1097-4571

Access Statistics for this article

More articles in Journal of the American Society for Information Science from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().