Improved Constrained k -Means Algorithm for Clustering with Domain Knowledge
Peihuang Huang,
Pei Yao,
Zhendong Hao,
Huihong Peng and
Longkun Guo
Additional contact information
Peihuang Huang: College of Mathematics and Data Science, Minjiang University, Fuzhou 350116, China
Pei Yao: College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, China
Zhendong Hao: College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, China
Huihong Peng: College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, China
Longkun Guo: School of Computer Science, Qilu University of Technology, Jinan 250353, China
Mathematics, 2021, vol. 9, issue 19, 1-14
Abstract:
Witnessing the tremendous development of machine learning technology, emerging machine learning applications impose challenges of using domain knowledge to improve the accuracy of clustering provided that clustering suffers a compromising accuracy rate despite its advantage of fast procession. In this paper, we model domain knowledge (i.e., background knowledge or side information), respecting some applications as must-link and cannot-link sets, for the sake of collaborating with k -means for better accuracy. We first propose an algorithm for constrained k -means, considering only must-links. The key idea is to consider a set of data points constrained by the must-links as a single data point with a weight equal to the weight sum of the constrained points. Then, for clustering the data points set with cannot-link, we employ minimum-weight matching to assign the data points to the existing clusters. At last, we carried out a numerical simulation to evaluate the proposed algorithms against the UCI datasets, demonstrating that our method outperforms the previous algorithms for constrained k -means as well as the traditional k -means regarding the clustering accuracy rate although with a slightly compromised practical runtime.
Keywords: constrained k -means; minimum weight matching; side information; domain knowledge (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/9/19/2390/pdf (application/pdf)
https://www.mdpi.com/2227-7390/9/19/2390/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:9:y:2021:i:19:p:2390-:d:643269
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().