A probabilistic approach to automatic keyword indexing. Part II. An algorithm for probabilistic indexing

Harter, Stephen P.

A probabilistic approach to automatic keyword indexing. Part II. An algorithm for probabilistic indexing

Stephen P. Harter

Journal of the American Society for Information Science, 1975, vol. 26, issue 5, 280-289

Abstract: In Part I of this study,* a mixture of two Poisson distributions was examined as a model of specialty word distribution. Formulas expressing the three parameters of the model in terms of empirical frequency statistics were derived, and a statistical measure intended to identify specialty words, consistent with the model, was proposed. In the present paper, Part II of the study, a probabilistic model of keyword indexing is outlined, and some of the consequences of the model are examined. An algorithm defining a measure of indexability is developed‐a measure intended to reflect the relative significance of words in documents. The measure is evaluated and is found to consistently produce indexes superior to those produced by another measure which had previously been identified in the literature as producing the best results.

Date: 1975
References: Add references at CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://doi.org/10.1002/asi.4630260504

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jamest:v:26:y:1975:i:5:p:280-289

Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1097-4571

Access Statistics for this article

More articles in Journal of the American Society for Information Science from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().