EconPapers    
Economics at your fingertips  
 

Novel Automated K-means++ Algorithm for Financial Data Sets

Guoyu Du, Xuehua Li, Lanjie Zhang, Libo Liu and Chaohua Zhao

Mathematical Problems in Engineering, 2021, vol. 2021, 1-12

Abstract:

The K-means algorithm has been extensively investigated in the field of text clustering because of its linear time complexity and adaptation to sparse matrix data. However, it has two main problems, namely, the determination of the number of clusters and the location of the initial cluster centres. In this study, we propose an improved K-means++ algorithm based on the Davies-Bouldin index (DBI) and the largest sum of distance called the SDK-means++ algorithm. Firstly, we use the term frequency-inverse document frequency to represent the data set. Secondly, we measure the distance between objects by cosine similarity. Thirdly, the initial cluster centres are selected by comparing the distance to existing initial cluster centres and the maximum density. Fourthly, clustering results are obtained using the K-means++ method. Lastly, DBI is used to obtain optimal clustering results automatically. Experimental results on real bank transaction volume data sets show that the SDK-means++ algorithm is more effective and efficient than two other algorithms in organising large financial text data sets. The F-measure value of the proposed algorithm is 0.97. The running time of the SDK-means++ algorithm is reduced by 42.9% and 22.4% compared with that for K-means and K-means++ algorithms, respectively.

Date: 2021
References: Add references at CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://downloads.hindawi.com/journals/MPE/2021/5521119.pdf (application/pdf)
http://downloads.hindawi.com/journals/MPE/2021/5521119.xml (text/xml)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:hin:jnlmpe:5521119

DOI: 10.1155/2021/5521119

Access Statistics for this article

More articles in Mathematical Problems in Engineering from Hindawi
Bibliographic data for series maintained by Mohamed Abdelhakeem ().

 
Page updated 2025-03-19
Handle: RePEc:hin:jnlmpe:5521119