Split size‐rank models for the distribution of index terms
Michael J. Nelson and
Jean M. Tague
Journal of the American Society for Information Science, 1985, vol. 36, issue 5, 283-296
Abstract:
Since the introduction of the Zipf distribution, many functions have been suggested for the frequency of words in text. Some of these models have also been applied to the distribution of index terms in a set of documents. The models are of two forms: rank‐frequency and frequency‐size. The former serve well to describe the distribution of high‐frequency terms; the latter the distribution of low‐frequency terms. In this article, a split model is proposed, which uses both a rank function for the high frequency terms and a size function for the low frequency terms, with the point of transition being determined either empirically or by rule. This model is fitted to the marginal empirical term distributions for four document datasets. Distributions to describe index term exhaustivity and term co‐occurrence are also considered briefly.
Date: 1985
References: Add references at CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1002/asi.4630360502
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jamest:v:36:y:1985:i:5:p:283-296
Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1097-4571
Access Statistics for this article
More articles in Journal of the American Society for Information Science from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().