VC-SLAM—A Handcrafted Data Corpus for the Construction of Semantic Models
Andreas Burgdorf,
Alexander Paulus,
André Pomp and
Tobias Meisen
Additional contact information
Andreas Burgdorf: Chair of Technologies and Management of Digital Transformation, University of Wuppertal, 42119 Wuppertal, Germany
Alexander Paulus: Chair of Technologies and Management of Digital Transformation, University of Wuppertal, 42119 Wuppertal, Germany
André Pomp: Chair of Technologies and Management of Digital Transformation, University of Wuppertal, 42119 Wuppertal, Germany
Tobias Meisen: Chair of Technologies and Management of Digital Transformation, University of Wuppertal, 42119 Wuppertal, Germany
Data, 2022, vol. 7, issue 2, 1-17
Abstract:
Ontology-based data management and knowledge graphs have emerged in recent years as efficient approaches for managing and utilizing diverse and large data sets. In this regard, research on algorithms for automatic semantic labeling and modeling as a prerequisite for both has made steady progress in the form of new approaches. The range of algorithms varies in the type of information used (data schema, values, or metadata), as well as in the underlying methodology (e.g., use of different machine learning methods or external knowledge bases). Approaches that have been established over the years, however, still come with various weaknesses. Most approaches are evaluated on few small data corpora specific to the approach. This reduces comparability and also limits statements for the general applicability and performance of those approaches. Other research areas, such as computer vision or natural language processing solve this problem by providing unified data corpora for the evaluation of specific algorithms and tasks. In this paper, we present and publish VC-SLAM to lay the necessary foundation for future research. This corpus allows the evaluation and comparison of semantic labeling and modeling approaches across different methodologies, and it is the first corpus that additionally allows to leverage textual data documentations for semantic labeling and modeling. Each of the contained 101 data sets consists of labels, data and metadata, as well as corresponding semantic labels and a semantic model that were manually created by human experts using an ontology that was explicitly built for the corpus. We provide statistical information about the corpus as well as a critical discussion of its strengths and shortcomings, and test the corpus with existing methods for labeling and modeling.
Keywords: semantic labeling; semantic modeling; semantic mapping; data corpus (search for similar items in EconPapers)
JEL-codes: C8 C80 C81 C82 C83 (search for similar items in EconPapers)
Date: 2022
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2306-5729/7/2/17/pdf (application/pdf)
https://www.mdpi.com/2306-5729/7/2/17/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jdataj:v:7:y:2022:i:2:p:17-:d:733302
Access Statistics for this article
Data is currently edited by Ms. Cecilia Yang
More articles in Data from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().