Statistical recognition of content terms in general text
Martin Dillon and
Peggy Federhart
Journal of the American Society for Information Science, 1984, vol. 35, issue 1, 1-10
Abstract:
This article discusses ways to improve the quality of retrieval systems that depend on the use of truncated words or quasi‐word stems as an indexing vocabulary. The problems addressed are the generalizability and stability of discriminant function analysis for selecting good topical terms from terms of relatively high frequency in a database drawn from abstracts of Harris Survey press releases. Results confirm that topical terms can be identified by their statistical properties. Consistently high recall of topical terms under a variety of different conditions implies persistent underlying properties strong enough to resist changes in test environment.
Date: 1984
References: Add references at CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1002/asi.4630350102
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jamest:v:35:y:1984:i:1:p:1-10
Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1097-4571
Access Statistics for this article
More articles in Journal of the American Society for Information Science from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().