EconPapers    
Economics at your fingertips  
 

High dimensional, robust, unsupervised record linkage

Bera Sabyasachi () and Chatterjee Snigdhansu ()
Additional contact information
Bera Sabyasachi: University of Minnesota, ; Minnesota, ; United States
Chatterjee Snigdhansu: University of Minnesota, ; Minnesota, ; United States

Statistics in Transition New Series, 2020, vol. 21, issue 4, 123-143

Abstract: We develop a technique for record linkage on high dimensional data, where the two datasets may not have any common variable, and there may be no training set available. Our methodology is based on sparse, high dimensional principal components. Since large and high dimensional datasets are often prone to outliers and aberrant observations, we propose a technique for estimating robust, high dimensional principal components. We present theoretical results validating the robust, high dimensional principal component estimation steps, and justifying their use for record linkage. Some numeric results and remarks are also presented.

Keywords: record linkage; principal components; high dimensional; robust. (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://doi.org/10.21307/stattrans-2020-034 (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:vrs:stintr:v:21:y:2020:i:4:p:123-143:n:11

DOI: 10.21307/stattrans-2020-034

Access Statistics for this article

Statistics in Transition New Series is currently edited by Włodzimierz Okrasa

More articles in Statistics in Transition New Series from Statistics Poland
Bibliographic data for series maintained by Peter Golla ().

 
Page updated 2025-03-20
Handle: RePEc:vrs:stintr:v:21:y:2020:i:4:p:123-143:n:11