A probabilistic approach to automatic keyword indexing. Part II. An algorithm for probabilistic indexing
Stephen P. Harter
Journal of the American Society for Information Science, 1975, vol. 26, issue 5, 280-289
Abstract:
In Part I of this study,* a mixture of two Poisson distributions was examined as a model of specialty word distribution. Formulas expressing the three parameters of the model in terms of empirical frequency statistics were derived, and a statistical measure intended to identify specialty words, consistent with the model, was proposed. In the present paper, Part II of the study, a probabilistic model of keyword indexing is outlined, and some of the consequences of the model are examined. An algorithm defining a measure of indexability is developed‐a measure intended to reflect the relative significance of words in documents. The measure is evaluated and is found to consistently produce indexes superior to those produced by another measure which had previously been identified in the literature as producing the best results.
Date: 1975
References: Add references at CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://doi.org/10.1002/asi.4630260504
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jamest:v:26:y:1975:i:5:p:280-289
Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1097-4571
Access Statistics for this article
More articles in Journal of the American Society for Information Science from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().