PDC-Transitive: An Enhanced Heuristic for Document Clustering Based on Relational Analysis Approach and Iterative MapReduce
Yasmine Lamari () and
Said Chah Slaoui ()
Additional contact information
Yasmine Lamari: Department of Computer Science, Faculty of Science of Rabat, Mohammed V University, 4 Avenue Ibn Battouta B. P. 1014 RP, Rabat, Morocco
Said Chah Slaoui: Department of Computer Science, Faculty of Science of Rabat, Mohammed V University, 4 Avenue Ibn Battouta B. P. 1014 RP, Rabat, Morocco
Journal of Information & Knowledge Management (JIKM), 2018, vol. 17, issue 02, 1-18
Abstract:
Recently, MapReduce-based implementations of clustering algorithms have been developed to cope with the Big Data phenomenon, and they show promising results particularly for the document clustering problem. In this paper, we extend an efficient data partitioning method based on the relational analysis (RA) approach and applied to the document clustering problem, called PDC-Transitive. The introduced heuristic is parallelised using the MapReduce model iteratively and designed with a single reducer which represents a bottleneck when processing large data, we improved the design of the PDC-Transitive method to avoid the data dependencies and reduce the computation cost. Experiment results on benchmark datasets demonstrate that the enhanced heuristic yields better quality results and requires less computing time compared to the original method.
Keywords: Document clustering; hard clustering; Hadoop; MapReduce; partitioning heuristic; relational analysis approach; unsupervised clustering (search for similar items in EconPapers)
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0219649218500211
Access to full text is restricted to subscribers
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:wsi:jikmxx:v:17:y:2018:i:02:n:s0219649218500211
Ordering information: This journal article can be ordered from
DOI: 10.1142/S0219649218500211
Access Statistics for this article
Journal of Information & Knowledge Management (JIKM) is currently edited by Professor Suliman Hawamdeh
More articles in Journal of Information & Knowledge Management (JIKM) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().