EconPapers    
Economics at your fingertips  
 

Topical document clustering: two-stage post processing technique

Poonam Goyal, N. Mehala, Divyansh Bhatia and Navneet Goyal

International Journal of Data Mining, Modelling and Management, 2018, vol. 10, issue 2, 127-170

Abstract: Clustering documents is an essential step in improving efficiency and effectiveness of information retrieval systems. We propose a two-phase split-merge (SM) algorithm, which can be applied to topical clusters obtained from existing query-context-aware document clustering algorithms, to produce soft topical document clusters. The SM is a post-processing technique which combines the advantages of document and feature-pivot topical document clustering approaches. The split phase splits the topical clusters by relating them to the topics obtained by disambiguating web search results, and converts them into homogeneous soft clusters. In the merge phase, similar clusters are merged by feature-pivot approach. The SM is tested on the outcome of two hierarchical query-context aware document clustering algorithms on different datasets including TREC session-track 2011 dataset. The obtained topical-clusters are also updated by an incremental approach with the progress in the data stream. The proposed algorithm improves the quality of clustering appreciably in all the experiments conducted.

Keywords: topical clustering; query clustering; query context; document clustering; incremental clustering; soft clustering. (search for similar items in EconPapers)
Date: 2018
References: Add references at CitEc
Citations:

Downloads: (external link)
http://www.inderscience.com/link.php?id=92536 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:ids:ijdmmm:v:10:y:2018:i:2:p:127-170

Access Statistics for this article

More articles in International Journal of Data Mining, Modelling and Management from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().

 
Page updated 2025-03-19
Handle: RePEc:ids:ijdmmm:v:10:y:2018:i:2:p:127-170