CNODE: clustering of set-valued non-ordered discrete data
Sunil Kumar,
Shamik Sural,
Alok Watve and
Sakti Pramanik
International Journal of Data Mining, Modelling and Management, 2009, vol. 1, issue 3, 310-334
Abstract:
This paper introduces a clustering technique named 'Clustering of set-valued Non-Ordered DiscretE data' (CNODE), in which each data item is a vector having a set of non-ordered discrete values per dimension. Since usual definitions of distance like Euclidean and Manhattan do not hold for 'non-ordered discrete data space' (NDDS), other measures like Hamming distance are often used to define distance between vectors having single-valued discrete dimensions. Such type of distance is not meaningful for set-valued dimensions and hence, we propose a similarity measure based on set intersection for clustering set-valued vectors. We also suggest a new measure for determining quality of clustering named 'lines of clustroids' (LOC) for this type of data. In contrast to other existing clustering techniques in NDDS, CNODE does not rely on any kind of pre-processing of dataset. Experiments with synthetic and real datasets show that CNODE is robust to data variations, scalable to large dataset size and efficient for high dimensions.
Keywords: clustering; set-valued data; non-ordered discrete data; categorical data; intersection coefficient; clustroids. (search for similar items in EconPapers)
Date: 2009
References: Add references at CitEc
Citations:
Downloads: (external link)
http://www.inderscience.com/link.php?id=27288 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ids:ijdmmm:v:1:y:2009:i:3:p:310-334
Access Statistics for this article
More articles in International Journal of Data Mining, Modelling and Management from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().