EconPapers    
Economics at your fingertips  
 

VC-SLAM—A Handcrafted Data Corpus for the Construction of Semantic Models

Andreas Burgdorf, Alexander Paulus, André Pomp and Tobias Meisen
Additional contact information
Andreas Burgdorf: Chair of Technologies and Management of Digital Transformation, University of Wuppertal, 42119 Wuppertal, Germany
Alexander Paulus: Chair of Technologies and Management of Digital Transformation, University of Wuppertal, 42119 Wuppertal, Germany
André Pomp: Chair of Technologies and Management of Digital Transformation, University of Wuppertal, 42119 Wuppertal, Germany
Tobias Meisen: Chair of Technologies and Management of Digital Transformation, University of Wuppertal, 42119 Wuppertal, Germany

Data, 2022, vol. 7, issue 2, 1-17

Abstract: Ontology-based data management and knowledge graphs have emerged in recent years as efficient approaches for managing and utilizing diverse and large data sets. In this regard, research on algorithms for automatic semantic labeling and modeling as a prerequisite for both has made steady progress in the form of new approaches. The range of algorithms varies in the type of information used (data schema, values, or metadata), as well as in the underlying methodology (e.g., use of different machine learning methods or external knowledge bases). Approaches that have been established over the years, however, still come with various weaknesses. Most approaches are evaluated on few small data corpora specific to the approach. This reduces comparability and also limits statements for the general applicability and performance of those approaches. Other research areas, such as computer vision or natural language processing solve this problem by providing unified data corpora for the evaluation of specific algorithms and tasks. In this paper, we present and publish VC-SLAM to lay the necessary foundation for future research. This corpus allows the evaluation and comparison of semantic labeling and modeling approaches across different methodologies, and it is the first corpus that additionally allows to leverage textual data documentations for semantic labeling and modeling. Each of the contained 101 data sets consists of labels, data and metadata, as well as corresponding semantic labels and a semantic model that were manually created by human experts using an ontology that was explicitly built for the corpus. We provide statistical information about the corpus as well as a critical discussion of its strengths and shortcomings, and test the corpus with existing methods for labeling and modeling.

Keywords: semantic labeling; semantic modeling; semantic mapping; data corpus (search for similar items in EconPapers)
JEL-codes: C8 C80 C81 C82 C83 (search for similar items in EconPapers)
Date: 2022
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2306-5729/7/2/17/pdf (application/pdf)
https://www.mdpi.com/2306-5729/7/2/17/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jdataj:v:7:y:2022:i:2:p:17-:d:733302

Access Statistics for this article

Data is currently edited by Ms. Cecilia Yang

More articles in Data from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jdataj:v:7:y:2022:i:2:p:17-:d:733302