EconPapers    
Economics at your fingertips  
 

Indexing and automatic significance analysis

Ivo Steinacker

Journal of the American Society for Information Science, 1974, vol. 25, issue 4, 237-241

Abstract: Intellectual indexing proceeds on three levels: The selection of phrases occurring in the document text (sequential indexing), the posting of specific phrases from the text to generic descriptors (generic indexing), and the choice of descriptors which are implicit to the document text (symbolic indexing). Automation has been attempted on all three levels: by concordance and autoposting. Here an algorithm is proposed to solve the problem of sequential indexing which does not use any grammatical or semantic analysis, but follows the principle of emulating human judgement by evaluation of machine‐recognizable attributes of structured word assemblies (text). The algorithm is based on producing “text cuts” of a few words in length and ordering them alphabetically. Afterwards, every “text cut” which appears with a certain limit frequency or above is considered significant (by human standards). The algorithm has been applied to a text body of about 220,000 words from the NASA bibliographic file and an “established” dictionary of significant terms has been created by this algorithm. As any phrase not occurring in the established dictionary is not suppressed, but posted to a floating dictionary, from which it may, if usage increases above the limit frequency, be transferred to the established dictionary, the algorithm presents a tool for the creation and maintenance of a “self‐adaptive” data base of text information.

Date: 1974
References: Add references at CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1002/asi.4630250406

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jamest:v:25:y:1974:i:4:p:237-241

Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1097-4571

Access Statistics for this article

More articles in Journal of the American Society for Information Science from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-19
Handle: RePEc:bla:jamest:v:25:y:1974:i:4:p:237-241