EconPapers    
Economics at your fingertips  
 

The use of titles for automatic document classification

Karen A. Hamill and Antonio Zamora

Journal of the American Society for Information Science, 1980, vol. 31, issue 6, 396-402

Abstract: An experimental computer program has been developed to classify documents according to the 80 sections and five major section groupings of Chemical Abstracts (CA). The program uses pattern recognition techniques supplemented by heuristics. During the “training” phase, words from pre‐classified documents are selected, and the probability of occurrence of each word in each section of CA is computed and stored in a reference dictionary. The “classification” phase matches each word of a document title against the dictionary and assigns a section number to the document using weights derived from the probabilities in the dictionary. Heuristic techniques are used to normalize word variants such as plurals, past tenses, and gerunds in both the training phase and the classification phase. The dictionary lookup technique is supplemented by the analysis of chemical nomenclature terms into their component word roots to influence the section to which the documents are assigned. Program performance and human consistency have been evaluated by comparing the program results against the published sections of CA and by conducting an experiment with people experienced in the assignment of documents to CA sections. The program assigned approximately 78% of the documents to the correct major section groupings of CA and 67% of the correct sections or cross‐references at a rate of 100 documents per second.

Date: 1980
References: Add references at CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://doi.org/10.1002/asi.4630310603

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jamest:v:31:y:1980:i:6:p:396-402

Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1097-4571

Access Statistics for this article

More articles in Journal of the American Society for Information Science from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-19
Handle: RePEc:bla:jamest:v:31:y:1980:i:6:p:396-402