TPBF: Two-Phase Bloom-Filter-Based End-to-End Data Integrity Verification Framework for Object-Based Big Data Transfer Systems
Preethika Kasu,
Prince Hamandawana and
Tae-Sun Chung
Additional contact information
Preethika Kasu: Department of Artificial Intelligence, Ajou University, Suwon 16499, Korea
Prince Hamandawana: Department of Computer Science and Engineering, Soongsil University, Seoul 06978, Korea
Tae-Sun Chung: Department of Artificial Intelligence, Ajou University, Suwon 16499, Korea
Mathematics, 2022, vol. 10, issue 9, 1-25
Abstract:
Computational science simulations produce huge volumes of data for scientific research organizations. Often, this data is shared by data centers distributed geographically for storage and analysis. Data corruption in the end-to-end route of data transmission is one of the major challenges in distributing the data geographically. End-to-end integrity verification is therefore critical for transmitting such data across data centers effectively. Although several data integrity techniques currently exist, most have a significant negative influence on the data transmission rate as well as the storage overhead. Therefore, existing data integrity techniques are not viable solutions in high performance computing environments where it is very common to transfer huge volumes of data across data centers. In this study, we propose a two-phase Bloom-filter-based end-to-end data integrity verification framework for object-based big data transfer systems. The proposed solution effectively handles data integrity errors by reducing the memory and storage overhead and minimizing the impact on the overall data transmission rate. We investigated the memory, storage, and data transfer rate overheads of the proposed data integrity verification framework on the overall data transfer performance. The experimental findings showed that the suggested framework had 5% and 10% overhead on the total data transmission rate and on the total memory usage, respectively. However, we observed significant savings in terms of storage requirements, when compared with state-of-the-art solutions.
Keywords: big data; geo-distributed data centers; data integrity; Bloom filter; parallel file system; high-performance computing (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/10/9/1591/pdf (application/pdf)
https://www.mdpi.com/2227-7390/10/9/1591/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:10:y:2022:i:9:p:1591-:d:810719
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().