Precision Without Labels: Detecting Cross-Applicants in Mortgage Data Using Unsupervised Learning

Elzayn, Hadi; Freyaldenhoven, Simon; Shin, Minchul

Precision Without Labels: Detecting Cross-Applicants in Mortgage Data Using Unsupervised Learning

Hadi Elzayn (), Simon Freyaldenhoven and Minchul Shin

No 25-25, Working Papers from Federal Reserve Bank of Philadelphia

Abstract: We develop a clustering-based algorithm to detect loan applicants who submit multiple applications (“cross-applicants”) in a loan-level dataset without personal identifiers. A key innovation of our approach is a novel evaluation method that does not require labeled training data, allowing us to optimize the tuning parameters of our machine learning algorithm. By applying this methodology to Home Mortgage Disclosure Act (HMDA) data, we create a unique dataset that consolidates mortgage applications to the individual applicant level across the United States. Our preferred specification identifies cross-applicants with 92.3% precision.

Pages: 17
Date: 2025-09-02
New Economics Papers: this item is included in nep-cmp and nep-ure
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.philadelphiafed.org/-/media/FRBP/Asset ... ers/2025/wp25-25.pdf (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:fip:fedpwp:101559

Ordering information: This working paper can be ordered from

DOI: 10.21799/frbp.wp.2025.25

Access Statistics for this paper

More papers in Working Papers from Federal Reserve Bank of Philadelphia Contact information at EDIRC.
Bibliographic data for series maintained by Beth Paul ().