Detective Gadget: Generic Iterative Entity Resolution over Dirty Data
Marcello Buoncristiano,
Giansalvatore Mecca,
Donatello Santoro and
Enzo Veltri ()
Additional contact information
Marcello Buoncristiano: Svelto!—Big Data-Cleaning and Analytics, 85100 Potenza, Italy
Giansalvatore Mecca: Dipartimento di Ingegneria, Università degli Studi della Basilicata, 85100 Potenza, Italy
Donatello Santoro: Dipartimento di Ingegneria, Università degli Studi della Basilicata, 85100 Potenza, Italy
Enzo Veltri: Dipartimento di Ingegneria, Università degli Studi della Basilicata, 85100 Potenza, Italy
Data, 2024, vol. 9, issue 12, 1-32
Abstract:
In the era of Big Data, entity resolution (ER), i.e., the process of identifying which records refer to the same entity in the real world, plays a critical role in data-integration tasks, especially in mission-critical applications where accuracy is mandatory, since we want to avoid integrating different entities or missing matches. However, existing approaches struggle with the challenges posed by rapidly changing data and the presence of dirtiness, which requires an iterative refinement during the time. We present Detective Gadget, a novel system for iterative ER that seamlessly integrates data-cleaning into the ER workflow. Detective Gadgetemploys an alias-based hashing mechanism for fast and scalable matching, check functions to detect and correct mismatches, and a human-in-the-loop framework to refine results through expert feedback. The system iteratively improves data quality and matching accuracy by leveraging evidence from both automated and manual decisions. Extensive experiments across diverse real-world scenarios demonstrate its effectiveness, achieving high accuracy and efficiency while adapting to evolving datasets.
Keywords: entity resolution; iterative; algorithms; design; performance (search for similar items in EconPapers)
JEL-codes: C8 C80 C81 C82 C83 (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2306-5729/9/12/139/pdf (application/pdf)
https://www.mdpi.com/2306-5729/9/12/139/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jdataj:v:9:y:2024:i:12:p:139-:d:1528760
Access Statistics for this article
Data is currently edited by Ms. Cecilia Yang
More articles in Data from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().