Clustering based on Kolmogorov–Smirnov statistic with application to bank card transaction data
Bingyi Jing and
Journal of the Royal Statistical Society Series C, 2021, vol. 70, issue 3, 558-578
Rapid developments in third‐party online payment platforms now make it possible to record massive bank card transaction data. Clustering on such transaction data is of great importance for the analysis of merchant behaviours. However, traditional methods based on generated features inevitably lead to much loss of information. To make better use of bank card transaction data, this study investigates the possibility of using the empirical cumulative distribution of transaction amounts. As the distance between two merchants can be measured using the two‐sample Kolmogorov–Smirnov test statistic, we propose the Kolmogorov–Smirnov K‐means clustering approach based on this distance measure. An approximation step is conducted to ensure the feasibility of the proposed method even for large‐scale transaction data, and the associated theoretical properties are investigated. Both simulations and an empirical study demonstrate that our method outperforms feature‐based methods and is computationally efficient for large‐scale data sets.
References: Add references at CitEc
Citations: Track citations by RSS feed
Downloads: (external link)
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
Persistent link: https://EconPapers.repec.org/RePEc:bla:jorssc:v:70:y:2021:i:3:p:558-578
Ordering information: This journal article can be ordered from
http://ordering.onli ... 1111/(ISSN)1467-9876
Access Statistics for this article
Journal of the Royal Statistical Society Series C is currently edited by R. Chandler and P. W. F. Smith
More articles in Journal of the Royal Statistical Society Series C from Royal Statistical Society Contact information at EDIRC.
Bibliographic data for series maintained by Wiley Content Delivery ().