EconPapers    
Economics at your fingertips  
 

A tool for data cube construction from structurally heterogeneous XML documents

Turkka Näppilä, Kalervo Järvelin and Timo Niemi

Journal of the American Society for Information Science and Technology, 2008, vol. 59, issue 3, 435-449

Abstract: Data cubes for OLAP (On‐Line Analytical Processing) often need to be constructed from data located in several distributed and autonomous information sources. Such a data integration process is challenging due to semantic, syntactic, and structural heterogeneity among the data. While XML (extensible markup language) is the de facto standard for data exchange, the three types of heterogeneity remain. Moreover, popular path‐oriented XML query languages, such as XQuery, require the user to know in much detail the structure of the documents to be processed and are, thus, effectively impractical in many real‐world data integration tasks. Several Lowest Common Ancestor (LCA)‐based XML query evaluation strategies have recently been introduced to provide a more structure‐independent way to access XML documents. We shall, however, show that this approach leads in the context of certain—not uncommon—types of XML documents to undesirable results. This article introduces a novel high‐level data extraction primitive that utilizes the purpose‐built Smallest Possible Context (SPC) query evaluation strategy. We demonstrate, through a system prototype for OLAP data cube construction and a sample application in informetrics, that our approach has real advantages in data integration.

Date: 2008
References: Add references at CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1002/asi.20756

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:59:y:2008:i:3:p:435-449

Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890

Access Statistics for this article

More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-19
Handle: RePEc:bla:jamist:v:59:y:2008:i:3:p:435-449