An ontology‐based technique for preserving user preferences in document‐category evolutions
Yen‐Hsien Lee,
Chih‐Ping Wei and
Paul Jen‐Hwa Hu
Journal of the American Society for Information Science and Technology, 2011, vol. 62, issue 3, 507-520
Abstract:
Influxes of new documents over time necessitate reorganization of document categories that a user has created previously. As documents are available in increasing quantities and accelerating frequencies, the manual approach to reorganizing document categories becomes prohibitively tedious and ineffective, thus making a system‐oriented approach appealing. Previous research (Larsen & Aone, 1999; Pantel & Lin, 2002) largely has followed the category‐discovery approach, which groups documents by using a document‐clustering technique to partition a document corpus. This approach does not consider existing categories a user created previously, which in effect reflect his or her document‐grouping preference. A handful of studies (Wei, Hu, & Dong, 2002; Wei, Hu, & Lee, 2009) have taken a category‐evolution approach to develop lexicon‐based techniques for preserving user preference in document‐category reorganizations, but have serious limitations. Responding to the significance of document‐category reorganizations and addressing the fundamental problems of salient, lexicon‐based techniques, we develop an ontology‐based category evolution (ONCE), a technique that first enriches a concept hierarchy by incorporating important concept descriptors (jointly referred to as an ontology) and then employs the resulting enriched ontology to support category evolutions at a concept level rather than analyzing and comparing feature vectors at the lexicon level. We empirically evaluate our proposed technique and compare it with two benchmark techniques: CE2 (a lexicon‐based category‐evolution technique) and hierarchical agglomerative clustering (HAC; a conventional hierarchical document‐clustering technique). Overall, our results show that the ONCE technique is more effective than are CE2 and HAC, across all the scenarios studied. Furthermore, the completeness of a concept hierarchy has important impacts on the performance of the proposed technique. Our results have some important implications for further research.
Date: 2011
References: Add references at CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1002/asi.21471
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:62:y:2011:i:3:p:507-520
Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890
Access Statistics for this article
More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().