Will they repay their debt? Identification of borrowers likely to be charged off
Caplescu Raluca Dana (),
Panaite Ana-Maria,
Pele Daniel Traian and
Strat Vasile Alecsandru
Additional contact information
Caplescu Raluca Dana: Bucharest University of Economic Studies,Bucharest, Romania
Panaite Ana-Maria: Bucharest University of Economic Studies,Bucharest, Romania
Pele Daniel Traian: Bucharest University of Economic Studies,Bucharest, Romania
Strat Vasile Alecsandru: Bucharest University of Economic Studies,Bucharest, Romania
Management & Marketing, 2020, vol. 15, issue 3, 393-409
Abstract:
Recent increase in peer-to-peer lending prompted for development of models to separate good and bad clients to mitigate risks both for lenders and for the platforms. The rapidly increasing body of literature provides several comparisons between various models. Among the most frequently employed ones are logistic regression, Support Vector Machines, neural networks and decision tree-based models. Among them, logistic regression has proved to be a strong candidate both because its good performance and due to its high explainability. The present paper aims to compare four pairs of models (for imbalanced and under-sampled data) meant to predict charged off clients by optimizing F1 score. We found that, if the data is balanced, Logistic Regression, both simple and with Stochastic Gradient Descent, outperforms LightGBM and K-Nearest Neighbors in optimizing F1 score. We chose this metric as it provides balance between the interests of the lenders and those of the platform. Loan term, debt-to-income ratio and number of accounts were found to be important positively related predictors of risk of charge off. At the other end of the spectrum, by far the strongest impact on charge off probability is that of the FICO score. The final number of features retained by the two models differs very much, because, although both models use Lasso for feature selection, Stochastic Gradient Descent Logistic Regression uses a stronger regularization. The analysis was performed using Python (numpy, pandas, sklearn and imblearn).
Keywords: peer-to-peer lending; creditworthiness; Logistic Regression; KNN; LightGBM (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://doi.org/10.2478/mmcks-2020-0023 (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:vrs:manmar:v:15:y:2020:i:3:p:393-409:n:4
DOI: 10.2478/mmcks-2020-0023
Access Statistics for this article
Management & Marketing is currently edited by Alina Mihaela Dima
More articles in Management & Marketing from Sciendo
Bibliographic data for series maintained by Peter Golla ().