EconPapers    
Economics at your fingertips  
 

Improved Constrained k -Means Algorithm for Clustering with Domain Knowledge

Peihuang Huang, Pei Yao, Zhendong Hao, Huihong Peng and Longkun Guo
Additional contact information
Peihuang Huang: College of Mathematics and Data Science, Minjiang University, Fuzhou 350116, China
Pei Yao: College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, China
Zhendong Hao: College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, China
Huihong Peng: College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, China
Longkun Guo: School of Computer Science, Qilu University of Technology, Jinan 250353, China

Mathematics, 2021, vol. 9, issue 19, 1-14

Abstract: Witnessing the tremendous development of machine learning technology, emerging machine learning applications impose challenges of using domain knowledge to improve the accuracy of clustering provided that clustering suffers a compromising accuracy rate despite its advantage of fast procession. In this paper, we model domain knowledge (i.e., background knowledge or side information), respecting some applications as must-link and cannot-link sets, for the sake of collaborating with k -means for better accuracy. We first propose an algorithm for constrained k -means, considering only must-links. The key idea is to consider a set of data points constrained by the must-links as a single data point with a weight equal to the weight sum of the constrained points. Then, for clustering the data points set with cannot-link, we employ minimum-weight matching to assign the data points to the existing clusters. At last, we carried out a numerical simulation to evaluate the proposed algorithms against the UCI datasets, demonstrating that our method outperforms the previous algorithms for constrained k -means as well as the traditional k -means regarding the clustering accuracy rate although with a slightly compromised practical runtime.

Keywords: constrained k -means; minimum weight matching; side information; domain knowledge (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/9/19/2390/pdf (application/pdf)
https://www.mdpi.com/2227-7390/9/19/2390/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:9:y:2021:i:19:p:2390-:d:643269

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jmathe:v:9:y:2021:i:19:p:2390-:d:643269