A Framework of Global Credit-Scoring Modeling Using Outlier Detection and Machine Learning in a P2P Lending Platform

Shih, Dong-Her; Wu, Ting-Wei; Shih, Po-Yuan; Lu, Nai-An; Shih, Ming-Hung

A Framework of Global Credit-Scoring Modeling Using Outlier Detection and Machine Learning in a P2P Lending Platform

Dong-Her Shih, Ting-Wei Wu, Po-Yuan Shih, Nai-An Lu and Ming-Hung Shih
Additional contact information
Dong-Her Shih: Department of Information Management, National Yunlin University of Science and Technology, Douliu 64002, Taiwan
Ting-Wei Wu: Department of Information Management, National Yunlin University of Science and Technology, Douliu 64002, Taiwan
Po-Yuan Shih: Department of Finance, National Yunlin University of Science and Technology, Douliu 64002, Taiwan
Nai-An Lu: Department of Information Management, National Yunlin University of Science and Technology, Douliu 64002, Taiwan
Ming-Hung Shih: Department of Electrical and Computer Engineering, Iowa State University, 2520 Osborn Drive, Ames, IA 50011, USA

Mathematics, 2022, vol. 10, issue 13, 1-13

Abstract: A great challenge for credit-scoring models in online peer-to-peer (P2P) lending platforms is that credit-scoring models simply discard rejected applicants. This selective discard can lead to an inability to increase the number of potentially qualified applicants, ultimately affecting the revenue of the lending platform. One way to deal with this is to employ reject inference, a technique that infers the state of a rejected sample and incorporates the results into a credit-scoring model. The most popular approach to reject inference is to use a credit-scoring model built only on accepted samples to directly predict the status of rejected samples. However, the distribution of accepted samples in online P2P lending is different from the distribution of rejected samples, and the credit-scoring model on the original accepted sample may no longer apply. In addition, the acceptance sample may also include applicants who cannot repay the loan. If these applicants can be filtered out, the losses to the lending platform can also be reduced. Therefore, we propose a global credit-scoring model framework that combines multiple feature selection methods and classifiers to better evaluate the model after adding rejected samples. In addition, this study uses outlier detection methods to explore the internal relationships of all samples, which can delete outlier applicants in accepted samples or increase outlier applicants in rejected samples. Finally, this study uses four data samples and reject inference to construct four different credit-scoring models. The experimental results show that the credit-scoring model combining Pearson and random forest proposed in this study has significantly better accuracy and AUC than other scholars. Compared with previous studies, using outlier detection to remove outliers in loan acceptance samples and identify potentially creditworthy loan applicants from loan rejection samples is a good strategy. Furthermore, this study not only improves the accuracy of the credit-scoring model but also increases the number of lenders, which in turn increases the profitability of the lending platform.

Keywords: financial markets; credit scoring; machine learning; P2P lending (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.mdpi.com/2227-7390/10/13/2282/pdf (application/pdf)
https://www.mdpi.com/2227-7390/10/13/2282/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:10:y:2022:i:13:p:2282-:d:851596

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().