XHQE: A hybrid system for scalable selectivity estimation of XML queries

El-Alfy, E.-S. M.; Mohammed, S.; Barradah, A. F.

XHQE: A hybrid system for scalable selectivity estimation of XML queries

E.-S. M. El-Alfy (), S. Mohammed () and A. F. Barradah ()
Additional contact information
E.-S. M. El-Alfy: King Fahd University of Petroleum & Minerals
S. Mohammed: King Fahd University of Petroleum & Minerals
A. F. Barradah: Exploration Network Operations Department, Saudi ARAMCO

Information Systems Frontiers, 2016, vol. 18, issue 6, No 14, 1233-1249

Abstract: Abstract With the increasing popularity of XML applications in enterprise and big data systems, the use of efficient query optimizers is becoming very essential. The performance of an XML query optimizer depends heavily on the query selectivity estimators it uses to find the best possible query execution plan. In this work, we propose a novel selectivity estimator which is a hybrid of structural synopsis and statistics, called XHQE. The structural synopsis enhances the accuracy of estimation and the structural statistics makes it scalable to the allocated memory space. The structural synopsis is generated by labeling the nodes of the source XML dataset using a fingerprint function and merging subtrees with similar fingerprints (i.e. having similar structures). The generated structural synopsis and structural statistics are then used to estimate the selectivity of given queries. We studied the performance of the proposed approach using different types of queries and four benchmark datasets with different structural characteristics. We compared XHQE with existing algorithms such as Sampling, TreeSketch and one histogram-based algorithm. The experimental results showed that the XHQE is significantly better than other algorithms in terms of estimation accuracy and scalability for semi-uniform datasets. For non-uniform datasets, the proposed algorithm has comparable estimation accuracy to TreeSketch as the allocated memory size is highly reduced, yet the estimation data generation time of the proposed approach is much lower (e.g., TreeSketch took more than 50 times longer than that of the proposed approach for XMark dataset). Comparing to the histogram-based algorithm, our approach supports regular twig quires in addition to having higher accuracy when both run under similar memory constraints.

Keywords: XML query processing; Query optimization; Selectivity estimation; Twig pattern matching; Structural synopsis (search for similar items in EconPapers)
Date: 2016
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s10796-015-9561-6 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:infosf:v:18:y:2016:i:6:d:10.1007_s10796-015-9561-6

Ordering information: This journal article can be ordered from
http://www.springer.com/journal/10796

DOI: 10.1007/s10796-015-9561-6

Access Statistics for this article

Information Systems Frontiers is currently edited by Ram Ramesh and Raghav Rao

More articles in Information Systems Frontiers from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().