Dynamic Deduplication Decision in a Hadoop Distributed File System
Ruay-Shiung Chang,
Chih-Shan Liao,
Kuo-Zheng Fan and
Chia-Ming Wu
International Journal of Distributed Sensor Networks, 2014, vol. 10, issue 4, 630380
Abstract:
Data are generated and updated tremendously fast by users through any devices in anytime and anywhere in big data. Coping with these multiform data in real time is a heavy challenge. Hadoop distributed file system (HDFS) is designed to deal with data for building a distributed data center. HDFS uses the data duplicates to increase data reliability. However, data duplicates need a lot of extra storage space and funding in infrastructure. Using the deduplication technique can improve utilization of the storage space effectively. In this paper, we propose a dynamic deduplication decision to improve the storage utilization of a data center which uses HDFS as its file system. Our proposed system can formulate a proper deduplication strategy to sufficiently utilize the storage space under the limited storage devices. Our deduplication strategy deletes useless duplicates to increase the storage space. The experimental results show that our method can efficiently improve the storage utilization of a data center using the HDFS system.
Date: 2014
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.sagepub.com/doi/10.1155/2014/630380 (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:sae:intdis:v:10:y:2014:i:4:p:630380
DOI: 10.1155/2014/630380
Access Statistics for this article
More articles in International Journal of Distributed Sensor Networks
Bibliographic data for series maintained by SAGE Publications ().