Dynamic Deduplication Decision in a Hadoop Distributed File System

Chang, Ruay-Shiung; Liao, Chih-Shan; Fan, Kuo-Zheng; Wu, Chia-Ming

Dynamic Deduplication Decision in a Hadoop Distributed File System

Ruay-Shiung Chang, Chih-Shan Liao, Kuo-Zheng Fan and Chia-Ming Wu

International Journal of Distributed Sensor Networks, 2014, vol. 10, issue 4, 630380

Abstract: Data are generated and updated tremendously fast by users through any devices in anytime and anywhere in big data. Coping with these multiform data in real time is a heavy challenge. Hadoop distributed file system (HDFS) is designed to deal with data for building a distributed data center. HDFS uses the data duplicates to increase data reliability. However, data duplicates need a lot of extra storage space and funding in infrastructure. Using the deduplication technique can improve utilization of the storage space effectively. In this paper, we propose a dynamic deduplication decision to improve the storage utilization of a data center which uses HDFS as its file system. Our proposed system can formulate a proper deduplication strategy to sufficiently utilize the storage space under the limited storage devices. Our deduplication strategy deletes useless duplicates to increase the storage space. The experimental results show that our method can efficiently improve the storage utilization of a data center using the HDFS system.

Date: 2014
References: Add references at CitEc
Citations:

Downloads: (external link)
https://journals.sagepub.com/doi/10.1155/2014/630380 (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:sae:intdis:v:10:y:2014:i:4:p:630380

DOI: 10.1155/2014/630380

Access Statistics for this article

More articles in International Journal of Distributed Sensor Networks
Bibliographic data for series maintained by SAGE Publications ().