EconPapers    
Economics at your fingertips  
 

SQL query optimization for highly normalized Big data

Golov N.I. and Ronnback L.
Additional contact information
Golov N.I.: National Research University Higher School of Economics
Ronnback L.: Stocholm University

Бизнес-информатика, 2015, issue 3 (33), 7-14

Abstract: This paper describes an approachforfast ad-hoc analysis of Big Data inside a relational data model. The approach strives to achieve maximal utilization of highly normalized temporary tables through the merge join algorithm. It is designed for the Anchor modeling technique, which requires a very high level of table normalization. Anchor modeling is a novel data warehouse modeling technique, designed for classical databases and adapted by the authors of the article for Big Data environment and a massively parallel processing (MPP) database. Anchor modeling provides flexibility and high speed of data loading, where the presented approach adds support for fast ad-hoc analysis of Big Data sets (tens of terabytes). Different approaches to query plan optimization are described and estimated, for row-based and column-based databases. Theoretical estimations and results of real data experiments carried out in a column-based MPP environment (HP Vertica) are presented and compared. The results show that the approach is particularly favorable when the available RAM resources are scarce, so that a switch is made from pure in-memory processing to spilling over from hard disk, while executing ad-hoc queries. Scaling is also investigated by running the same analysis on different numbers of nodes in the MPP cluster. Configurations of five, ten and twelve nodes were tested, using click stream data of Avito, the biggest classified site in Russia.

Keywords: BIG DATA; MASSIVELY PARALLEL PROCESSING (MPP); DATABASE; NORMALIZATION; ANALYTICS; AD-HOC; QUERYING; MODELING; PERFORMANCE; БОЛЬШИЕ ДАННЫЕ; МАССИВНО-ПАРАЛЛЕЛЬНАЯ ОБРАБОТКА (MPP); БАЗА ДАННЫХ; НОРМАЛИЗАЦИЯ; АНАЛИТИКА; АНАЛИТИКА "НА ЛЕТУ"; ЗАПРОСЫ; МОДЕЛИРОВАНИЕ; ПРОИЗВОДИТЕЛЬНОСТЬ (search for similar items in EconPapers)
Date: 2015
References: Add references at CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://cyberleninka.ru/article/n/sql-query-optimiz ... -normalized-big-data

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:scn:025686:16084594

Access Statistics for this article

More articles in Бизнес-информатика from CyberLeninka, Федеральное государственное автономное образовательное учреждение высшего образования «Национальный исследовательский университет «Высшая школа экономики»
Bibliographic data for series maintained by CyberLeninka ().

 
Page updated 2025-03-20
Handle: RePEc:scn:025686:16084594