EconPapers    
Economics at your fingertips  
 

An expert system for quality control in bibliographic databases

Claudio Todeschini and Michael P. Farrell

Journal of the American Society for Information Science, 1989, vol. 40, issue 1, 1-11

Abstract: An Expert System is presented that can identify errors in the intellectual decisions made by indexers when categorizing documents into an a priori category scheme. The system requires the compilation of a Knowledge Base that incorporates in statistical form the decisions on the linking of indexing and categorization derived from a preceding period of the bibliographic database. New input entering the database is checked against the Knowledge Base, using the descriptor indexing assigned to each record, and the system computes a value for the match of each record with the particular category chosen by the indexer. This category match value is used as a criterion for identifying those documents that have been erroneously categorized. The system was tested on a large sample of almost 26,000 documents, representing all the literature falling into ten of the subject categories of the Energy Data Base during the five year period 1980–1984. The Energy Data Base is a large bibliographic database covering the world's energy‐related literature. For valid comparisons among categories, the Knowledge Base must be constructed with an approximately equal number of unique descriptors for each subject category. The system identified those items with high probability of having been erroneously categorized. These items, constituting up to 5% of the sample, were evaluated manually by subject specialists for correct categorization and then compared with the results of the Expert System. Of those pieces of literature deemed by the system to be erroneously categorized, about 75% did indeed belong to a different category. This percentage, however, is dependent on the level at which the threshold on the category match value is set. With a lower threshold value, the percentage can be raised to 90%, but this is accompanied by a lowering of the absolute number of wrongly categorized records caught by the system. The Expert System can be considered as a first step to a complete semiautomatic categorization system requiring human intervention only in poorly indexed pieces of literature. It is also self‐improving, since in an operational environment the Knowledge Base would be routinely updated, using the most recent period of the database from which erroneously categorized items would have been eliminated by the previous version of the Knowledge Base; hence, each new version will produce better grounds for decision making. © 1989 John Wiley & Sons, Inc.

Date: 1989
References: Add references at CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1002/(SICI)1097-4571(198901)40:13.0.CO;2-A

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jamest:v:40:y:1989:i:1:p:1-11

Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1097-4571

Access Statistics for this article

More articles in Journal of the American Society for Information Science from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-19
Handle: RePEc:bla:jamest:v:40:y:1989:i:1:p:1-11