Classification of Building Types in Germany: A Data-Driven Modeling Approach
Abhilash Bandam,
Eedris Busari,
Chloi Syranidou,
Jochen Linssen and
Detlef Stolten
Additional contact information
Abhilash Bandam: IEK-3—Techno-Economic Systems Analysis, Institute of Energy and Climate Research, Forschungszentrum Jülich GmbH, 52428 Jülich, Germany
Eedris Busari: IEK-3—Techno-Economic Systems Analysis, Institute of Energy and Climate Research, Forschungszentrum Jülich GmbH, 52428 Jülich, Germany
Chloi Syranidou: IEK-3—Techno-Economic Systems Analysis, Institute of Energy and Climate Research, Forschungszentrum Jülich GmbH, 52428 Jülich, Germany
Jochen Linssen: IEK-3—Techno-Economic Systems Analysis, Institute of Energy and Climate Research, Forschungszentrum Jülich GmbH, 52428 Jülich, Germany
Detlef Stolten: IEK-3—Techno-Economic Systems Analysis, Institute of Energy and Climate Research, Forschungszentrum Jülich GmbH, 52428 Jülich, Germany
Data, 2022, vol. 7, issue 4, 1-23
Abstract:
Details on building levels play an essential part in a number of real-world application models. Energy systems, telecommunications, disaster management, the internet-of-things, health care, and marketing are a few of the many applications that require building information. The essential variables that most of these models require are building type, house type, area of living space, and number of residents. In order to acquire some of this information, this paper introduces a methodology and generates corresponding data. The study was conducted for specific applications in energy system modeling. Nonetheless, these data can also be used in other applications. Building locations and some of their details are openly available in the form of map data from OpenStreetMap (OSM). However, data regarding building types (i.e., residential, industrial, office, single-family house, multi-family house, etc.) are only partially available in the OSM dataset. Therefore, a machine learning classification algorithm for predicting the building types on the basis of the OSM buildings’ data was introduced. Although the OSM dataset is the fundamental and most crucial one used for modeling, the machine learning algorithm’s training was performed on a dataset that was prepared by combining several features from three other datasets. The generated dataset consists of approximately 29 million buildings, of which about 19 million are residential, with 72% being single-family houses and the rest multi-family ones that include two-family houses and apartment buildings. Furthermore, the results were validated through a comparison with publicly available statistical data. The comparison of the resulting data with official statistics reveals that there is a percentage error of 3.64% for residential buildings, 13.14% for single-family houses, and −15.38% for multi-family houses classification. Nevertheless, by incorporating the building types, this dataset is able to complement existing building information in studies in which building type information is crucial.
Keywords: missing values; class imbalance; data analysis; geospatial data; feature selection; data visualization; classification; energy system analysis (search for similar items in EconPapers)
JEL-codes: C8 C80 C81 C82 C83 (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
https://www.mdpi.com/2306-5729/7/4/45/pdf (application/pdf)
https://www.mdpi.com/2306-5729/7/4/45/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jdataj:v:7:y:2022:i:4:p:45-:d:790255
Access Statistics for this article
Data is currently edited by Ms. Cecilia Yang
More articles in Data from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().