Discovering Similarity Across Heterogeneous Features: A Case Study of Clinico-Genomic Analysis
Vandana P. Janeja,
Josephine M. Namayanja,
Yelena Yesha,
Anuja Kench and
Vasundhara Misal
Additional contact information
Vandana P. Janeja: University of Maryland, Baltimore County, USA
Josephine M. Namayanja: University of Massachusetts, Boston, USA
Yelena Yesha: University of Maryland Baltimore County, USA
Anuja Kench: University of Maryland, Baltimore County, USA
Vasundhara Misal: University of Maryland, Baltimore County, USA
International Journal of Data Warehousing and Mining (IJDWM), 2020, vol. 16, issue 4, 63-83
Abstract:
The analysis of both continuous and categorical attributes generating a heterogeneous mix of attributes poses challenges in data clustering. Traditional clustering techniques like k-means clustering work well when applied to small homogeneous datasets. However, as the data size becomes large, it becomes increasingly difficult to find meaningful and well-formed clusters. In this paper, the authors propose an approach that utilizes a combined similarity function, which looks at similarity across numeric and categorical features and employs this function in a clustering algorithm to identify similarity between data objects. The findings indicate that the proposed approach handles heterogeneous data better by forming well-separated clusters.
Date: 2020
References: Add references at CitEc
Citations:
Downloads: (external link)
http://services.igi-global.com/resolvedoi/resolve. ... 018/IJDWM.2020100104 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:igg:jdwm00:v:16:y:2020:i:4:p:63-83
Access Statistics for this article
International Journal of Data Warehousing and Mining (IJDWM) is currently edited by Eric Pardede
More articles in International Journal of Data Warehousing and Mining (IJDWM) from IGI Global
Bibliographic data for series maintained by Journal Editor ().