Representing a Model for the Anonymization of Big Data Stream Using In-Memory Processing

Shamsinejad, Elham; Banirostam, Touraj; Pedram, Mir Mohsen; Rahmani, Amir Masoud

Representing a Model for the Anonymization of Big Data Stream Using In-Memory Processing

Elham Shamsinejad (), Touraj Banirostam (), Mir Mohsen Pedram () and Amir Masoud Rahmani ()
Additional contact information
Elham Shamsinejad: Islamic Azad University
Touraj Banirostam: Islamic Azad University
Mir Mohsen Pedram: Kharazmi University
Amir Masoud Rahmani: Islamic Azad University

Annals of Data Science, 2025, vol. 12, issue 1, No 10, 223-252

Abstract: Abstract In light of the escalating privacy risks in the big data era, this paper introduces an innovative model for the anonymization of big data streams, leveraging in-memory processing within the Spark framework. The approach is founded on the principle of K-anonymity and propels the field forward by critically evaluating various anonymization methods and algorithms, benchmarking their performance with respect to time and space complexities. A distinctive formula for optimized cluster determination in the K-means algorithm is presented, along with a novel tuple expiration time strategy for the efficient purging of clusters. The integration of these components into Spark’s RDD and MLlib modules results in a significant decrease in execution time and data loss rates, even with increasing data volumes. The paper’s notable contributions are its methodological advancements that offer a robust, scalable solution for data anonymization, safeguarding user privacy without sacrificing data utility or processing efficiency.

Keywords: Big data; Anonymity; Confidentiality; Data disclosure; Privacy (search for similar items in EconPapers)
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s40745-024-00556-x Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:aodasc:v:12:y:2025:i:1:d:10.1007_s40745-024-00556-x

Ordering information: This journal article can be ordered from
https://www.springer ... gement/journal/40745

DOI: 10.1007/s40745-024-00556-x

Access Statistics for this article

Annals of Data Science is currently edited by Yong Shi

More articles in Annals of Data Science from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().