A Context-Based Performance Enhancement Algorithm for Columnar Storage in MapReduce with Hive
Yashvardhan Sharma,
Saurabh Verma,
Sumit Kumar and
Shivam U.
Additional contact information
Yashvardhan Sharma: Department of Computer Science and Information Systems, Birla Institute of Technology and Science, Pilani, India
Saurabh Verma: Department of Computer Science and Information Systems, Birla Institute of Technology and Science, Pilani, India
Sumit Kumar: Department of Computer Science and Information Systems, Birla Institute of Technology and Science, Pilani, India
Shivam U.: Department of Computer Science and Information Systems, Birla Institute of Technology and Science, Pilani, India
International Journal of Cloud Applications and Computing (IJCAC), 2013, vol. 3, issue 4, 38-50
Abstract:
To achieve high reliability and scalability, most large-scale data warehouse systems have adopted the cluster-based architecture. In this context, MapReduce has emerged as a promising architecture for large scale data warehousing and data analytics on commodity clusters. The MapReduce framework offers several lucrative features such as high fault-tolerance, scalability and use of a variety of hardware from low to high range. But these benefits have resulted in substantial performance compromise. In this paper, we propose the design of a novel cluster-based data warehouse system, Daenyrys for data processing on Hadoop – an open source implementation of the MapReduce framework under the umbrella of Apache. Daenyrys is a data management system which has the capability to take decision about the optimum partitioning scheme for the Hadoop's distributed file system (DFS). The optimum partitioning scheme improves the performance of the complete framework. The choice of the optimum partitioning is query-context dependent. In Daenyrys, the columns are formed into optimized groups to provide the basis for the partitioning of tables vertically. Daenyrys has an algorithm that monitors the context of current queries and based on the observations, it re-partitions the DFS for better performance and resource utilization. In the proposed system, Hive, a MapReduce-based SQL-like query engine is supported above the DFS.
Date: 2013
References: Add references at CitEc
Citations:
Downloads: (external link)
http://services.igi-global.com/resolvedoi/resolve. ... 018/ijcac.2013100104 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:igg:jcac00:v:3:y:2013:i:4:p:38-50
Access Statistics for this article
International Journal of Cloud Applications and Computing (IJCAC) is currently edited by B. B. Gupta
More articles in International Journal of Cloud Applications and Computing (IJCAC) from IGI Global
Bibliographic data for series maintained by Journal Editor ().