EconPapers    
Economics at your fingertips  
 

Optimizing data quality in big data through unsupervised record linkage techniques

Aissam Bendida (), Amar Bensaber Djamel (), Réda Adjoudj () and Yahia Atig ()

Edelweiss Applied Science and Technology, 2025, vol. 9, issue 6, 846-863

Abstract: In today's era of Big Data, maintaining high-quality data is crucial for effective data management. One key aspect of this is record linkage, which involves identifying, comparing, and merging records from different sources that refer to the same real-world entity. However, traditional record linkage methods struggle to keep up with the rapidly increasing volume and diversity of data. These methods often rely on labeled data, which can be expensive and difficult to obtain. To overcome these challenges, unsupervised blocking techniques have emerged as a promising alternative, allowing large-scale datasets to be managed efficiently without the need for pre-labeled data. In this article, we introduce a novel approach that integrates the Firefly Algorithm for optimized feature selection, Locality-Sensitive Hashing (LSH) for dimensionality reduction, and Length-based Feature Weighting (LFW) for improved data representation. Our methodology aims to enhance both the accuracy and scalability of record linkage in Big Data environments. Experimental results show that our approach is highly effective, demonstrating its potential to significantly improve data quality in large-scale datasets.

Keywords: Blocking; Locality-sensitive-hashing; Firefly algorithm; Length-based feature weighting; Record Linkage. (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://learning-gate.com/index.php/2576-8484/article/view/7970/2709 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:ajp:edwast:v:9:y:2025:i:6:p:846-863:id:7970

Access Statistics for this article

More articles in Edelweiss Applied Science and Technology from Learning Gate
Bibliographic data for series maintained by Melissa Fernandes ().

 
Page updated 2025-06-12
Handle: RePEc:ajp:edwast:v:9:y:2025:i:6:p:846-863:id:7970