Singular-Value-Based Cluster Number Detection Method
Yating Li,
Jianghui Cai (),
Haifeng Yang (),
Jie Wang,
Chenhui Shi,
Bo Liang,
Xujun Zhao and
Yaling Xun
Additional contact information
Yating Li: School of Electronic Information Engineering, Taiyuan University of Science and Technology (TYUST), Taiyuan 030024, China
Jianghui Cai: School of Computer Science and Technology, Taiyuan University of Science and Technology (TYUST), Taiyuan 030024, China
Haifeng Yang: School of Computer Science and Technology, Taiyuan University of Science and Technology (TYUST), Taiyuan 030024, China
Jie Wang: School of Computer Science and Technology, Taiyuan University of Science and Technology (TYUST), Taiyuan 030024, China
Chenhui Shi: School of Electronic Information Engineering, Taiyuan University of Science and Technology (TYUST), Taiyuan 030024, China
Bo Liang: School of Computer Science and Technology, Taiyuan Normal University (TYNU), Jinzhong 030619, China
Xujun Zhao: School of Computer Science and Technology, Taiyuan University of Science and Technology (TYUST), Taiyuan 030024, China
Yaling Xun: School of Computer Science and Technology, Taiyuan University of Science and Technology (TYUST), Taiyuan 030024, China
Mathematics, 2025, vol. 13, issue 3, 1-20
Abstract:
The cluster number can directly affect the clustering effect and its application in real-world scenarios. Its determination is one of the key issues in cluster analysis. According to singular value decomposition (SVD), the characteristic directions of larger singular values likely represent the primary data patterns, trends, or structures corresponding to the main information. In clustering analysis, the main information and structure are likely related to the cluster structure itself. The number of larger singular values may correspond to the number of clusters, and their main information may correspond to different clusters. Based on this, a singular-value-based cluster number detection method is proposed. First, the transferred K-nearest neighbors (TKNN) density formula is proposed to address the limitation of the DPC algorithm in failing to identify centroids in sparse clusters of unbalanced datasets. Second, core data are selected by the DPC algorithm with a modified density formula to better capture the data distribution. Third, based on the selected core data, a sparse similarity matrix is constructed to further highlight the relationships between data and enhance the distribution of data features. Finally, SVD is performed on the sparse similarity matrix to obtain singular values, the cumulative contribution rate is introduced to determine the number of relatively large singular values (i.e., the cluster number). Experimental results show that our method is superior in determining the cluster number for datasets with complex shapes.
Keywords: cluster number; singular value; DPC; sigmoid function; cumulative contribution rate (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/13/3/527/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/3/527/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:3:p:527-:d:1584271
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().