How magic a bullet is machine learning for credit analysis? An exploration with fintech lending data
J. Christina Wang and
Charles B. Perkins
Journal of Credit Risk
Abstract:
Fintech lending to consumers has grown rapidly since the 2007–9 Great Recession. This study applies machine learning (ML) methods to loan-level data from the largest fintech lender of personal loans, to assess the extent to which these methods can produce more accurate out-of-sample default predictions relative to standard regression models, as argued by fintech lending’s advocates. To explain loan outcomes, this analysis accounts for the economic conditions faced by a borrower after origination, which are typically absent from other ML studies of default. For the given data, the ML methods indeed improve prediction accuracy, but more so over horizons within a year. Having more data up to but not beyond a certain quantity enhances the relative predictive accuracy of the ML methods, likely because there has been data or model drift over time, so that more complex models can suffer more out-of-sample misses. Prediction accuracy rises, but only marginally, with additional standard credit variables beyond the core set, suggesting that unconventional data needs to be sufficiently informative as a whole to help consumers with little or no credit history. Fintech lending to consumers has grown rapidly since the 2007–9 Great Recession. This study applies machine learning (ML) methods to loan-level data from the largest fintech lender of personal loans, to assess the extent to which these methods can produce more accurate out-of-sample default predictions relative to standard regression models, as argued by fintech lending’s advocates. To explain loan outcomes, this analysis accounts for the economic conditions faced by a borrower after origination, which are typically absent from other ML studies of default. For the given data, the ML methods indeed improve prediction accuracy, but more so over horizons within a year. Having more data up to but not beyond a certain quantity enhances the relative predictive accuracy of the ML methods, likely because there has been data or model drift over time, so that more complex models can suffer more out-of-sample misses. Prediction accuracy rises, but only marginally, with additional standard credit variables beyond the core set, suggesting that unconventional data needs to be sufficiently informative as a whole to help consumers with little or no credit history. Finally, in this data, we find little statistically significant evidence that ML methods yield unequal benefits across subgroups of borrowers defined by their risk attributes, income or where they live.
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.risk.net/node/7961585 (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:rsk:journ1:7961585
Access Statistics for this article
More articles in Journal of Credit Risk from Journal of Credit Risk
Bibliographic data for series maintained by Thomas Paine ().