Data quality issues leading to sub optimal machine learning for money laundering models
Abhishek Gupta,
Dwijendra Nath Dwivedi,
Jigar Shah and
Ashish Jain
Journal of Money Laundering Control, 2021, vol. 25, issue 3, 551-555
Abstract:
Purpose - Good quality input data is critical to developing a robust machine learning model for identifying possible money laundering transactions. McKinsey, during one of the conferences of ACAMS, attributed data quality as one of the reasons for struggling artificial intelligence use cases in compliance to data. There were often use concerns raised on data quality of predictors such as wrong transaction codes, industry classification, etc. However, there has not been much discussion on the most critical variable of machine learning, the definition of an event, i.e. the date on which the suspicious activity reports (SAR) is filed. Design/methodology/approach - The team analyzed the transaction behavior of four major banks spread across Asia and Europe. Based on the findings, the team created a synthetic database comprising 2,000 SAR customers mimicking the time of investigation and case closure. In this paper, the authors focused on one very specific area of data quality, the definition of an event, i.e. the SAR/suspicious transaction report. Findings - The analysis of few of the banks in Asia and Europe suggests that this itself can improve the effectiveness of model and reduce the prediction span, i.e. the time lag between money laundering transaction done and prediction of money laundering as an alert for investigation Research limitations/implications - The analysis was done with existing experience of all situations where the time duration between alert and case closure is high (anywhere between 15 days till 10 months). Team could not quantify the impact of this finding due to lack of such actual case observed so far. Originality/value - The key finding from paper suggests that the money launderers typically either increase their level of activity or reduce their activity in the recent quarter. This is not true in terms of real behavior. They typically show a spike in activity through various means during money laundering. This in turn impacts the quality of insights that the model should be trained on. The authors believe that once the financial institutions start speeding up investigations on high risk cases, the scatter plot of SAR behavior will change significantly and will lead to better capture of money laundering behavior and a faster and more precise “catch” rate.
Keywords: Data quality; Anti money laundering; Machine learning; Model efficiency (search for similar items in EconPapers)
Date: 2021
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.emerald.com/insight/content/doi/10.110 ... d&utm_campaign=repec (text/html)
https://www.emerald.com/insight/content/doi/10.110 ... d&utm_campaign=repec (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eme:jmlcpp:jmlc-05-2021-0049
DOI: 10.1108/JMLC-05-2021-0049
Access Statistics for this article
Journal of Money Laundering Control is currently edited by Dr Li Hong Xing and Prof Barry Rider
More articles in Journal of Money Laundering Control from Emerald Group Publishing Limited
Bibliographic data for series maintained by Emerald Support ().