Learning a metric when clustering data points in the presence of constraints
Ahmad Ali Abin (),
Mohammad Ali Bashiri () and
Hamid Beigy ()
Additional contact information
Ahmad Ali Abin: Shahid Beheshti University, G. C.
Mohammad Ali Bashiri: Sharif University of Technology
Hamid Beigy: Sharif University of Technology
Advances in Data Analysis and Classification, 2020, vol. 14, issue 1, No 3, 29-56
Abstract:
Abstract Learning an appropriate distance measure under supervision of side information has become a topic of significant interest within machine learning community. In this paper, we address the problem of metric learning for constrained clustering by considering three important issues: (1) considering importance degree for constraints, (2) preserving the topological structure of data, and (3) preserving some natural distribution properties in the data. This work provides a unified way to handle different issues in constrained clustering by learning an appropriate distance measure. It has modeled the first issue by injecting the importance degree of constraints directly into an objective function. The topological structure of data is preserved by minimizing the reconstruction error of data in the target space. Finally we addressed the issue of preserving natural distribution properties in the data by using the proximity information of data. We have proposed two different methods to address the above mentioned issues. The first approach learns a linear transformation of data into a target space (linear-model) and the second one uses kernel functions to learn an appropriate distance measure (non-linear-model). Experiments show that considering these issues significantly improves clustering accuracy.
Keywords: Constrained clustering; Metric learning; Instance-level constraints; Weighted constraints; 30C40; 54E35; 91C20 (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s11634-019-00359-6 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:advdac:v:14:y:2020:i:1:d:10.1007_s11634-019-00359-6
Ordering information: This journal article can be ordered from
http://www.springer. ... ds/journal/11634/PS2
DOI: 10.1007/s11634-019-00359-6
Access Statistics for this article
Advances in Data Analysis and Classification is currently edited by H.-H. Bock, W. Gaul, A. Okada, M. Vichi and C. Weihs
More articles in Advances in Data Analysis and Classification from Springer, German Classification Society - Gesellschaft für Klassifikation (GfKl), Japanese Classification Society (JCS), Classification and Data Analysis Group of the Italian Statistical Society (CLADAG), International Federation of Classification Societies (IFCS)
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().