EconPapers    
Economics at your fingertips  
 

Split size‐rank models for the distribution of index terms

Michael J. Nelson and Jean M. Tague

Journal of the American Society for Information Science, 1985, vol. 36, issue 5, 283-296

Abstract: Since the introduction of the Zipf distribution, many functions have been suggested for the frequency of words in text. Some of these models have also been applied to the distribution of index terms in a set of documents. The models are of two forms: rank‐frequency and frequency‐size. The former serve well to describe the distribution of high‐frequency terms; the latter the distribution of low‐frequency terms. In this article, a split model is proposed, which uses both a rank function for the high frequency terms and a size function for the low frequency terms, with the point of transition being determined either empirically or by rule. This model is fitted to the marginal empirical term distributions for four document datasets. Distributions to describe index term exhaustivity and term co‐occurrence are also considered briefly.

Date: 1985
References: Add references at CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1002/asi.4630360502

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jamest:v:36:y:1985:i:5:p:283-296

Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1097-4571

Access Statistics for this article

More articles in Journal of the American Society for Information Science from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-19
Handle: RePEc:bla:jamest:v:36:y:1985:i:5:p:283-296