EconPapers    
Economics at your fingertips  
 

Cache-Based Multi-Query Optimization for Data-Intensive Scalable Computing Frameworks

Pietro Michiardi (), Damiano Carra () and Sara Migliorini ()
Additional contact information
Pietro Michiardi: Eurecom
Damiano Carra: University of Verona
Sara Migliorini: University of Verona

Information Systems Frontiers, 2021, vol. 23, issue 1, No 3, 35-51

Abstract: Abstract In modern large-scale distributed systems, analytics jobs submitted by various users often share similar work, for example scanning and processing the same subset of data. Instead of optimizing jobs independently, which may result in redundant and wasteful processing, multi-query optimization techniques can be employed to save a considerable amount of cluster resources. In this work, we introduce a novel method combining in-memory cache primitives and multi-query optimization, to improve the efficiency of data-intensive, scalable computing frameworks. By careful selection and exploitation of common (sub)expressions, while satisfying memory constraints, our method transforms a batch of queries into a new, more efficient one which avoids unnecessary recomputations. To find feasible and efficient execution plans, our method uses a cost-based optimization formulation akin to the multiple-choice knapsack problem. Extensive experiments on a prototype implementation of our system show significant benefits of worksharing for both TPC-DS workloads and detailed micro-benchmarks.

Keywords: Large scale computing; Caching; Optimization (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s10796-020-09995-2 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:infosf:v:23:y:2021:i:1:d:10.1007_s10796-020-09995-2

Ordering information: This journal article can be ordered from
http://www.springer.com/journal/10796

DOI: 10.1007/s10796-020-09995-2

Access Statistics for this article

Information Systems Frontiers is currently edited by Ram Ramesh and Raghav Rao

More articles in Information Systems Frontiers from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:infosf:v:23:y:2021:i:1:d:10.1007_s10796-020-09995-2