Optimization and Extension of Stream-Relation Joins
M. Asif Naeem ()
Additional contact information
M. Asif Naeem: Auckland University of Technology, Auckland, New Zealand
International Journal of Information Technology & Decision Making (IJITDM), 2019, vol. 18, issue 04, 1289-1315
Abstract:
Online stream processing is an emerging research area in the field of computer science. Semi-stream processing is a particular type of stream processing where a stream of data is processed with a disk-based relation. A semi-stream join operator is required to implement this operation. Many semi-stream joins use a queue of stream tuples to amortize access cost for the disk-based relation, and use an index to allow directed access to the relation, avoiding the loading of unnecessary partition of R. In such a situation, the question arises which R partitions should be accessed, as any stream tuple from the queue could serve as a lookup element for accessing the relation index. Existing algorithms use simple safe and correct strategies, but are not optimal in the sense that they maximize the join service rate. This paper makes two contributions: first contribution is in terms of optimization in which we analyze strategies for selecting an appropriate lookup element, particularly for skewed stream data. We show that a good selection strategy can improve service rate of the existing join algorithms significantly. Second contribution is in terms of extension in which we develop multi-stage join for semi-stream join algorithms. Multi-stage join is important when stream data needs to be joined with two or more tables in the relation e.g., stream of sales data needs information to be added from product and customer tables in the relation. To the best of our knowledge, none of the existing algorithms implement this feature. For the service rate evaluation we use two well-performed existing algorithms CACHEJOIN and HYBRIDJOIN. We evaluate the service rate using real, TPC-H, and synthetic datasets with a known skewed distribution. We also present the cost model for our multi-stage join.
Keywords: Semi-stream data processing; multi-stage join; disk-based relation; indexing (search for similar items in EconPapers)
Date: 2019
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0219622019500214
Access to full text is restricted to subscribers
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:wsi:ijitdm:v:18:y:2019:i:04:n:s0219622019500214
Ordering information: This journal article can be ordered from
DOI: 10.1142/S0219622019500214
Access Statistics for this article
International Journal of Information Technology & Decision Making (IJITDM) is currently edited by Yong Shi
More articles in International Journal of Information Technology & Decision Making (IJITDM) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().