The use of titles for automatic document classification
Karen A. Hamill and
Antonio Zamora
Journal of the American Society for Information Science, 1980, vol. 31, issue 6, 396-402
Abstract:
An experimental computer program has been developed to classify documents according to the 80 sections and five major section groupings of Chemical Abstracts (CA). The program uses pattern recognition techniques supplemented by heuristics. During the “training” phase, words from pre‐classified documents are selected, and the probability of occurrence of each word in each section of CA is computed and stored in a reference dictionary. The “classification” phase matches each word of a document title against the dictionary and assigns a section number to the document using weights derived from the probabilities in the dictionary. Heuristic techniques are used to normalize word variants such as plurals, past tenses, and gerunds in both the training phase and the classification phase. The dictionary lookup technique is supplemented by the analysis of chemical nomenclature terms into their component word roots to influence the section to which the documents are assigned. Program performance and human consistency have been evaluated by comparing the program results against the published sections of CA and by conducting an experiment with people experienced in the assignment of documents to CA sections. The program assigned approximately 78% of the documents to the correct major section groupings of CA and 67% of the correct sections or cross‐references at a rate of 100 documents per second.
Date: 1980
References: Add references at CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://doi.org/10.1002/asi.4630310603
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jamest:v:31:y:1980:i:6:p:396-402
Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1097-4571
Access Statistics for this article
More articles in Journal of the American Society for Information Science from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().