Stacking machine-learning models for anomaly detection: comparing AnaCredit to other banking datasets
Pasquale Maddaloni,
Davide Nicola Continanza (),
Andrea del Monaco,
Daniele Figoli (),
Marco di Lucido (),
Filippo Quarta () and
Giuseppe Turturiello ()
Additional contact information
Davide Nicola Continanza: Bank of Italy
Daniele Figoli: Bank of Italy
Marco di Lucido: Bank of Italy
Filippo Quarta: Bank of Italy
Giuseppe Turturiello: Bank of Italy
No 689, Questioni di Economia e Finanza (Occasional Papers) from Bank of Italy, Economic Research and International Relations Area
Abstract:
This paper addresses the issue of assessing the quality of granular datasets reported by banks via machine learning models. In particular, it investigates how supervised and unsupervised learning algorithms can exploit patterns that can be recognized in other data sources dealing with similar phenomena (although these phenomena are available at a different level of aggregation), in order to detect potential outliers to be submitted to banks for their own checks. The above machine learning algorithms are finally stacked in a semi-supervised fashion in order to enhance their individual outlier detection ability. The described methodology is applied to compare the granular AnaCredit dataset, firstly with the Balance Sheet Items statistics (BSI), and secondly with the harmonised supervisory statistics of the Financial Reporting (FinRep), which are compiled for the Eurosystem and the Single Supervisory Mechanism, respectively. In both cases, we show that the performance of the stacking technique, in terms of F1-score, is higher than in each algorithm alone.
Keywords: banking data; data quality management; outlier and anomaly detection; machine learning; auto-encoder; robust regression; pseudo labelling (search for similar items in EconPapers)
JEL-codes: C18 C81 G21 (search for similar items in EconPapers)
Date: 2022-04
New Economics Papers: this item is included in nep-ban, nep-big and nep-cmp
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.bancaditalia.it/pubblicazioni/qef/2022-0689/QEF_689_22.pdf (application/pdf)
Related works:
Chapter: Stacking machine learning models for anomaly detection: comparing AnaCredit to other banking data sets (2023) 
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bdi:opques:qef_689_22
Access Statistics for this paper
More papers in Questioni di Economia e Finanza (Occasional Papers) from Bank of Italy, Economic Research and International Relations Area Contact information at EDIRC.
Bibliographic data for series maintained by ().