A theory of term importance in automatic text analysis
G. Salton,
C. S. Yang and
C. T. Yu
Journal of the American Society for Information Science, 1975, vol. 26, issue 1, 33-44
Abstract:
A good deal of work has been done over the years in an attempt to use statistical or probabilistic techniques as a basis for automatic indexing and content analysis. (1–10) Unfortunately, many of these methods are lacking in effectiveness, and the more refined procedures are computationally unattractive. A new technique, known as discrimination value analysis, ranks the text words in accordance with how well they are able to discriminate the documents of a collection from each other; that is, the value of a term depends on how much the average separation between individual documents changes when the given term is assigned for content identification. The best words are those which achieve the greatest separation. The discrimination value analysis is computationally simple, and it assigns a specific role in content analysis to single words, juxtaposed words and phrases, and word groups or thesaurus categories. Experimental results are given showing the effectiveness of the technique.
Date: 1975
References: Add references at CitEc
Citations: View citations in EconPapers (6)
Downloads: (external link)
https://doi.org/10.1002/asi.4630260106
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jamest:v:26:y:1975:i:1:p:33-44
Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1097-4571
Access Statistics for this article
More articles in Journal of the American Society for Information Science from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().