Semantic-Similarity-Based Schema Matching for Management of Building Energy Data
Zhiyu Pan (),
Guanchen Pan and
Antonello Monti
Additional contact information
Zhiyu Pan: Institute for Automation of Complex Power Systems, RWTH Aachen University, 52074 Aachen, Germany
Guanchen Pan: Institute for Automation of Complex Power Systems, RWTH Aachen University, 52074 Aachen, Germany
Antonello Monti: Institute for Automation of Complex Power Systems, RWTH Aachen University, 52074 Aachen, Germany
Energies, 2022, vol. 15, issue 23, 1-23
Abstract:
The increase in heterogeneous data in the building energy domain creates a difficult challenge for data integration. Schema matching, which maps the raw data from the building energy domain to a generic data model, is the necessary step in data integration and provides a unique representation. Only a small amount of labeled data for schema matching exists and it is time-consuming and labor-intensive to manually label data. This paper applies semantic-similarity methods to the automatic schema-mapping process by combining knowledge from natural language processing, which reduces the manual effort in heterogeneous data integration. The active-learning method is applied to solve the lack-of-labeled-data problem in schema matching. The results of the schema matching with building-energy-domain data show the pre-trained language model provides a massive improvement in the accuracy of schema matching and the active-learning method greatly reduces the amount of labeled data required.
Keywords: semantic similarity; schema matching; active learning (search for similar items in EconPapers)
JEL-codes: Q Q0 Q4 Q40 Q41 Q42 Q43 Q47 Q48 Q49 (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/1996-1073/15/23/8894/pdf (application/pdf)
https://www.mdpi.com/1996-1073/15/23/8894/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jeners:v:15:y:2022:i:23:p:8894-:d:983389
Access Statistics for this article
Energies is currently edited by Ms. Agatha Cao
More articles in Energies from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().