On Knowledge-Enhanced Document Clustering
Manjeet Rege,
Josan Koruthu and
Reynold Bailey
Additional contact information
Manjeet Rege: Rochester Institute of Technology, Rochester, NY, USA
Josan Koruthu: Rochester Institute of Technology, Rochester, NY, USA
Reynold Bailey: Rochester Institute of Technology, Rochester, NY, USA
International Journal of Information Retrieval Research (IJIRR), 2012, vol. 2, issue 3, 72-82
Abstract:
Document clustering plays an important role in text analytics by finding natural groupings of documents based on their similarity determined by the words appearing in them. Many of the clustering algorithms accessible through various text analytics tools are completely unsupervised in nature. That is, they are unable to incorporate any domain knowledge that might be available about the documents to improve the clustering accuracy and relevance. The authors present a graph partitioning based semi-supervised document clustering algorithm. The user provides knowledge about few of the documents in the form of “must-link” and “cannot-link” constraints between pairs of documents. A “must-link” constraint between two documents expresses the fact that the user feels that the two corresponding documents must be clustered irrespective of their dissimilarity. Similarly, a “cannot-link” signifies that the two documents should never be clustered together no matter how similar they might happen to be. These constraints are then incorporated into a graph partitioning based into a computationally efficient document clustering algorithm. Through experiments performed on publicly available text datasets, the proposed framework is validated.
Date: 2012
References: Add references at CitEc
Citations:
Downloads: (external link)
http://services.igi-global.com/resolvedoi/resolve. ... 018/ijirr.2012070105 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:igg:jirr00:v:2:y:2012:i:3:p:72-82
Access Statistics for this article
International Journal of Information Retrieval Research (IJIRR) is currently edited by Zhongyu Lu
More articles in International Journal of Information Retrieval Research (IJIRR) from IGI Global
Bibliographic data for series maintained by Journal Editor ().