Fraud Detection by Integrating Multisource Heterogeneous Presence-Only Data

Qiu, Yongqin; Chen, Yuanxing; Fang, Kan; Yu, Lean; Fang, Kuangnan

Fraud Detection by Integrating Multisource Heterogeneous Presence-Only Data

Yongqin Qiu (), Yuanxing Chen (), Kan Fang (), Lean Yu () and Kuangnan Fang ()
Additional contact information
Yongqin Qiu: International Institute of Finance, School of Management, University of Science and Technology of China, Hefei 230026, China
Yuanxing Chen: Yau Mathematical Sciences Center, Tsinghua University, Beijing 100084, China
Kan Fang: College of Management and Economics, Tianjin University, Tianjin 300072, China; and Laboratory of Computation and Analytics of Complex Management Systems (CACMS), Tianjin University, Tianjin 300072, China
Lean Yu: Business School, Sichuan University, Chengdu 610065, China
Kuangnan Fang: School of Economics, Xiamen University, Xiamen 316005, China

INFORMS Journal on Computing, 2025, vol. 37, issue 4, 998-1017

Abstract: In credit fraud detection practice, certain fraudulent transactions often evade detection because of the hidden nature of fraudulent behavior. To address this issue, an increasing number of positive-unlabeled (PU) learning techniques have been employed by more and more financial institutions. However, most of these methods are designed for single data sets and do not take into account the heterogeneity of data when they are collected from different sources. In this paper, we propose an integrative PU learning method (I-PU) for pooling information from multiple heterogeneous PU data sets. A novel approach that penalizes group differences is developed to explicitly and automatically identify the cluster structures of coefficients across different data sets, thus offering a plausible interpretation of heterogeneity. Furthermore, we apply a bilevel selection method to detect the sparse structure at both the group level and within-group level. Theoretically, we show that our proposed estimator has the oracle property. Computationally, we design an expectation-maximization (EM) algorithm framework and propose an alternating direction method of multipliers (ADMM) algorithm to solve it. Simulation results show that our proposed method has better numerical performance in terms of variable selection, parameter estimation, and prediction ability. Finally, a real-world application showcases the effectiveness of our method in identifying distinct coefficient clusters and its superior prediction performance compared with direct data merging or separate modeling. This result also offers valuable insights for financial institutions in developing targeted fraud detection systems.

Keywords: fraud detection; integrative analysis; clustering; variable selection; PU learning (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://dx.doi.org/10.1287/ijoc.2023.0366 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:orijoc:v:37:y:2025:i:4:p:998-1017

Access Statistics for this article

More articles in INFORMS Journal on Computing from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().