EconPapers    
Economics at your fingertips  
 

PDC-Transitive: An Enhanced Heuristic for Document Clustering Based on Relational Analysis Approach and Iterative MapReduce

Yasmine Lamari () and Said Chah Slaoui ()
Additional contact information
Yasmine Lamari: Department of Computer Science, Faculty of Science of Rabat, Mohammed V University, 4 Avenue Ibn Battouta B. P. 1014 RP, Rabat, Morocco
Said Chah Slaoui: Department of Computer Science, Faculty of Science of Rabat, Mohammed V University, 4 Avenue Ibn Battouta B. P. 1014 RP, Rabat, Morocco

Journal of Information & Knowledge Management (JIKM), 2018, vol. 17, issue 02, 1-18

Abstract: Recently, MapReduce-based implementations of clustering algorithms have been developed to cope with the Big Data phenomenon, and they show promising results particularly for the document clustering problem. In this paper, we extend an efficient data partitioning method based on the relational analysis (RA) approach and applied to the document clustering problem, called PDC-Transitive. The introduced heuristic is parallelised using the MapReduce model iteratively and designed with a single reducer which represents a bottleneck when processing large data, we improved the design of the PDC-Transitive method to avoid the data dependencies and reduce the computation cost. Experiment results on benchmark datasets demonstrate that the enhanced heuristic yields better quality results and requires less computing time compared to the original method.

Keywords: Document clustering; hard clustering; Hadoop; MapReduce; partitioning heuristic; relational analysis approach; unsupervised clustering (search for similar items in EconPapers)
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0219649218500211
Access to full text is restricted to subscribers

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:wsi:jikmxx:v:17:y:2018:i:02:n:s0219649218500211

Ordering information: This journal article can be ordered from

DOI: 10.1142/S0219649218500211

Access Statistics for this article

Journal of Information & Knowledge Management (JIKM) is currently edited by Professor Suliman Hawamdeh

More articles in Journal of Information & Knowledge Management (JIKM) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().

 
Page updated 2025-03-20
Handle: RePEc:wsi:jikmxx:v:17:y:2018:i:02:n:s0219649218500211