EconPapers    
Economics at your fingertips  
 

Efficient single‐pass index construction for text databases

Steffen Heinz and Justin Zobel

Journal of the American Society for Information Science and Technology, 2003, vol. 54, issue 8, 713-729

Abstract: Efficient construction of inverted indexes is essential to provision of search over large collections of text data. In this article, we review the principal approaches to inversion, analyze their theoretical cost, and present experimental results. We identify the drawbacks of existing inversion approaches and propose a single‐pass inversion method that, in contrast to previous approaches, does not require the complete vocabulary of the indexed collection in main memory, can operate within limited resources, and does not sacrifice speed with high temporary storage requirements. We show that the performance of the single‐pass approach can be improved by constructing inverted files in segments, reducing the cost of disk accesses during inversion of large volumes of data.

Date: 2003
References: Add references at CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1002/asi.10268

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:54:y:2003:i:8:p:713-729

Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890

Access Statistics for this article

More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-19
Handle: RePEc:bla:jamist:v:54:y:2003:i:8:p:713-729