A Detailed Review on the Prominent Compression Methods Used for Reducing the Data Volume of Big Data

Anuradha, D.; Bhuvaneswari, S.

A Detailed Review on the Prominent Compression Methods Used for Reducing the Data Volume of Big Data

D. Anuradha () and S. Bhuvaneswari ()
Additional contact information
D. Anuradha: Pondicherry University
S. Bhuvaneswari: Pondicherry University

Annals of Data Science, 2016, vol. 3, issue 1, No 3, 47-62

Abstract: Abstract The volume of Big data is the primary challenge faced by today’s electronic world. Compressing data should be an important aspect of the huge volume to improve the overall performance of the Big data management system and Big data analytics. There is a quiet few compression methods that can reduce the cost of data management and data transfer and improve efficiency of data analysis. Adaptive data compression approach finds out the suitable data compression technique and the location of the data compression. De-duplication removes duplicate data from the Big data store. Resemblance detection and elimination algorithm uses two techniques namely, Dup-Adj and improved super-feature approach. Using them the similar data chunks are separated from non-similar data chunks. The Delta compression is also used to compress the data before storage. The general compression algorithms are computationally complex and also degrade the application response time. To address this application-specific ZIP-IO framework for FPGA accelerated compression is studied. In this framework a simple instruction trace entropy compression algorithm is implemented in FPGA substrate. The Record-aware Compression (RaC) technique guarantees that the splitting of compressed data blocks does not contain partial records in the data blocks and it is implemented in Hadoop MapReduce.

Keywords: De-duplication; Resemblance detection; Super-feature; Delta compression; FPGA; ZIP-IO; Record-aware Compression (search for similar items in EconPapers)
Date: 2016
References: View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://link.springer.com/10.1007/s40745-016-0069-9 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:aodasc:v:3:y:2016:i:1:d:10.1007_s40745-016-0069-9

Ordering information: This journal article can be ordered from
https://www.springer ... gement/journal/40745

DOI: 10.1007/s40745-016-0069-9

Access Statistics for this article

Annals of Data Science is currently edited by Yong Shi

More articles in Annals of Data Science from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().