EconPapers    
Economics at your fingertips  
 

Clustering based on Kolmogorov–Smirnov statistic with application to bank card transaction data

Yingqiu Zhu, Qiong Deng, Danyang Huang, Bingyi Jing and Bo Zhang

Journal of the Royal Statistical Society Series C, 2021, vol. 70, issue 3, 558-578

Abstract: Rapid developments in third‐party online payment platforms now make it possible to record massive bank card transaction data. Clustering on such transaction data is of great importance for the analysis of merchant behaviours. However, traditional methods based on generated features inevitably lead to much loss of information. To make better use of bank card transaction data, this study investigates the possibility of using the empirical cumulative distribution of transaction amounts. As the distance between two merchants can be measured using the two‐sample Kolmogorov–Smirnov test statistic, we propose the Kolmogorov–Smirnov K‐means clustering approach based on this distance measure. An approximation step is conducted to ensure the feasibility of the proposed method even for large‐scale transaction data, and the associated theoretical properties are investigated. Both simulations and an empirical study demonstrate that our method outperforms feature‐based methods and is computationally efficient for large‐scale data sets.

Date: 2021
References: Add references at CitEc
Citations: Track citations by RSS feed

Downloads: (external link)
https://doi.org/10.1111/rssc.12471

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jorssc:v:70:y:2021:i:3:p:558-578

Ordering information: This journal article can be ordered from
http://ordering.onli ... 1111/(ISSN)1467-9876

Access Statistics for this article

Journal of the Royal Statistical Society Series C is currently edited by R. Chandler and P. W. F. Smith

More articles in Journal of the Royal Statistical Society Series C from Royal Statistical Society Contact information at EDIRC.
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2021-06-05
Handle: RePEc:bla:jorssc:v:70:y:2021:i:3:p:558-578