An Approach to Extracting Topic-guided Views from the Sources of a Data Lake
Claudia Diamantini (),
Paolo Lo Giudice (),
Domenico Potena (),
Emanuele Storti () and
Domenico Ursino ()
Additional contact information
Claudia Diamantini: DII, Polytechnic University of Marche
Paolo Lo Giudice: DIIES, University “Mediterranea” of Reggio Calabria
Domenico Potena: DII, Polytechnic University of Marche
Emanuele Storti: DII, Polytechnic University of Marche
Domenico Ursino: DII, Polytechnic University of Marche
Information Systems Frontiers, No 0, 20 pages
Abstract:
Abstract In the last years, data lakes are emerging as an effective and an efficient support for information and knowledge extraction from a huge amount of highly heterogeneous and quickly changing data sources. Data lake management requires the definition of new techniques, very different from the ones adopted for data warehouses in the past. In this scenario, one of the most challenging issues to address consists in the extraction of topic-guided (i.e., thematic) views from the (very heterogeneous and often unstructured) sources of a data lake. In this paper, we propose a new network-based model to uniformly represent structured, semi-structured and unstructured sources of a data lake. Then, we present a new approach to, at least partially, “structuring” unstructured data. Finally, we define a technique to extract topic-guided views from the sources of a data lake, based on similarity and other semantic relationships among source metadata.
Keywords: Data lakes; Unstructuted data sources; Metadata management; Thematic views; Semantic similarities; DBpedia (search for similar items in EconPapers)
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s10796-020-10010-x Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:infosf:v::y::i::d:10.1007_s10796-020-10010-x
Ordering information: This journal article can be ordered from
http://www.springer.com/journal/10796
DOI: 10.1007/s10796-020-10010-x
Access Statistics for this article
Information Systems Frontiers is currently edited by Ram Ramesh and Raghav Rao
More articles in Information Systems Frontiers from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().