Training a hierarchical classifier using inter document relationships
Susan Gauch,
Aravind Chandramouli and
Shankar Ranganathan
Journal of the American Society for Information Science and Technology, 2009, vol. 60, issue 1, 47-58
Abstract:
Text classifiers automatically classify documents into appropriate concepts for different applications. Most classification approaches use flat classifiers that treat each concept as independent, even when the concept space is hierarchically structured. In contrast, hierarchical text classification exploits the structural relationships between the concepts. In this article, we explore the effectiveness of hierarchical classification for a large concept hierarchy. Since the quality of the classification is dependent on the quality and quantity of the training data, we evaluate the use of documents selected from subconcepts to address the sparseness of training data for the top‐level classifiers and the use of document relationships to identify the most representative training documents. By selecting training documents using structural and similarity relationships, we achieve a statistically significant improvement of 39.8% (from 54.5–76.2%) in the accuracy of the hierarchical classifier over that of the flat classifier for a large, three‐level concept hierarchy.
Date: 2009
References: Add references at CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1002/asi.20951
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:60:y:2009:i:1:p:47-58
Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890
Access Statistics for this article
More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().