An experimental system for automatic identification of personal names and personal titles in newspaper texts
Casimir Borkowski
American Documentation, 1967, vol. 18, issue 3, 131-138
Abstract:
Natural language seems to contain various special‐purpose subsystems, e.g., personal titles, personal names, dates, street addresses, place names—each with its own structure which relative to the total structure of language is rather simple. An ability to identify automatically words and word strings belonging to various special‐purpose linguistic subsystems (akin to some thesaurus classes) may prove to be very useful since they play an important role in the making of indexes and in various systems for extracting and distributing information. This article describes some of the main problems involved in automatic identification in newspaper texts of words and word strings belonging to two important linguistic subsystems, viz., personal titles and names; lists some of the major rules of an algorithm designed to perform this task; presents statistics concerning the algorithm's accuracy and exhaustiveness obtained in manual application of the algorithm to texts; and suggests some applications for computer programs capable of recognizing personal titles and names. The results obtained indicate that an automatic system capable of accurate and exhaustive identification of personal titles and names in texts requires recognition procedures which are rather complex. It is therefore suggested that along with researching and developing methods for high‐quality automatic classification of words in texts, it may be advisable to set up efficient procedures for manual classification and tagging of words in texts, and automatic extraction of data from texts which were recognized either manually or automatically. Such action seems appropriate since automatic extraction of information from manually recognized texts would probably constitute a valuable service, and, when automatic procedures for identifying dates, personal names, personal titles, trade names, company names, chemical formulas, numbers and measure words, and so forth become competitive with manual ones, the data‐processing profession will be already in possession of operational computer programs capable of extracting data from recognized exts.
Date: 1967
References: Add references at CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1002/asi.5090180305
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:amedoc:v:18:y:1967:i:3:p:131-138
Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1936-6108
Access Statistics for this article
American Documentation is currently edited by Javed Mostafa
More articles in American Documentation from Wiley Blackwell
Bibliographic data for series maintained by Wiley Content Delivery ().