Precision Without Labels: Detecting Cross-Applicants in Mortgage Data Using Unsupervised Learning
Hadi Elzayn (),
Simon Freyaldenhoven and
Minchul Shin
No 25-25, Working Papers from Federal Reserve Bank of Philadelphia
Abstract:
We develop a clustering-based algorithm to detect loan applicants who submit multiple applications (“cross-applicants”) in a loan-level dataset without personal identifiers. A key innovation of our approach is a novel evaluation method that does not require labeled training data, allowing us to optimize the tuning parameters of our machine learning algorithm. By applying this methodology to Home Mortgage Disclosure Act (HMDA) data, we create a unique dataset that consolidates mortgage applications to the individual applicant level across the United States. Our preferred specification identifies cross-applicants with 92.3% precision.
Pages: 17
Date: 2025-09-02
New Economics Papers: this item is included in nep-cmp
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.philadelphiafed.org/-/media/FRBP/Asset ... ers/2025/wp25-25.pdf (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:fip:fedpwp:101559
Ordering information: This working paper can be ordered from
DOI: 10.21799/frbp.wp.2025.25
Access Statistics for this paper
More papers in Working Papers from Federal Reserve Bank of Philadelphia Contact information at EDIRC.
Bibliographic data for series maintained by Beth Paul ().