EconPapers    
Economics at your fingertips  
 

Clustering with Proximity Graphs: Exact and Efficient Algorithms

Michail Kazimianec and Nikolaus Augsten
Additional contact information
Michail Kazimianec: Faculty of Economics, Vilnius University, Vilnius, Lithuania
Nikolaus Augsten: Faculty of Computer Science, Free University of Bozen-Bolzano, Bozen-Bolzano, Italy

International Journal of Knowledge-Based Organizations (IJKBO), 2013, vol. 3, issue 4, 84-104

Abstract: Graph Proximity Cleansing (GPC) is a string clustering algorithm that automatically detects cluster borders and has been successfully used for string cleansing. For each potential cluster a so-called proximity graph is computed, and the cluster border is detected based on the proximity graph. However, the computation of the proximity graph is expensive and the state-of-the-art GPC algorithms only approximate the proximity graph using a sampling technique. Further, the quality of GPC clusters has never been compared to standard clustering techniques like k-means, density-based, or hierarchical clustering. In this article the authors propose two efficient algorithms, PG-DS and PG-SM, for the exact computation of proximity graphs. The authors experimentally show that our solutions are faster even if the sampling-based algorithms use very small sample sizes. The authors provide a thorough experimental evaluation of GPC and conclude that it is very efficient and shows good clustering quality in comparison to the standard techniques. These results open a new perspective on string clustering in settings, where no knowledge about the input data is available.

Date: 2013
References: Add references at CitEc
Citations:

Downloads: (external link)
http://services.igi-global.com/resolvedoi/resolve. ... 018/ijkbo.2013100105 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:igg:jkbo00:v:3:y:2013:i:4:p:84-104

Access Statistics for this article

International Journal of Knowledge-Based Organizations (IJKBO) is currently edited by John Wang

More articles in International Journal of Knowledge-Based Organizations (IJKBO) from IGI Global
Bibliographic data for series maintained by Journal Editor ().

 
Page updated 2025-03-19
Handle: RePEc:igg:jkbo00:v:3:y:2013:i:4:p:84-104