Machine Learning Assessment of Damage Grade for Post-Earthquake Buildings: A Three-Stage Approach Directly Handling Categorical Features

Li, Yutao; Jia, Chuanguo; Chen, Hong; Su, Hongchen; Chen, Jiahao; Wang, Duoduo

Machine Learning Assessment of Damage Grade for Post-Earthquake Buildings: A Three-Stage Approach Directly Handling Categorical Features

Yutao Li, Chuanguo Jia (), Hong Chen, Hongchen Su, Jiahao Chen and Duoduo Wang
Additional contact information
Yutao Li: School of Civil Engineering, Chongqing University, Chongqing 400045, China
Chuanguo Jia: School of Civil Engineering, Chongqing University, Chongqing 400045, China
Hong Chen: School of Computer Science and Engineering, Beihang University, Beijing 100191, China
Hongchen Su: School of Civil Engineering, Chongqing University, Chongqing 400045, China
Jiahao Chen: School of Civil Engineering, Chongqing University, Chongqing 400045, China
Duoduo Wang: School of Civil Engineering, Chongqing University, Chongqing 400045, China

Sustainability, 2023, vol. 15, issue 18, 1-23

Abstract: The rapid assessment of post-earthquake building damage for rescue and reconstruction is a crucial strategy to reduce the enormous number of human casualties and economic losses caused by earthquakes. Conventional machine learning (ML) approaches for this problem usually employ one-hot encoding to cope with categorical features, and their overall procedure is neither sufficient nor comprehensive. Therefore, this study proposed a three-stage approach, which can directly handle categorical features and enhance the entire methodology of ML applications. In stage I, an integrated data preprocessing framework involving subjective–objective feature selection was proposed and performed on a dataset of buildings after the 2015 Gorkha earthquake. In stage II, four machine learning models, KNN, XGBoost, CatBoost, and LightGBM, were trained and tested on the dataset. The best model was judged by comprehensive metrics, including the proposed risk coefficient. In stage III, the feature importance, the relationships between the features and the model’s output, and the feature interaction effects were investigated by Shapley additive explanations. The results indicate that the LightGBM model has the best overall performance with the highest accuracy of 0.897, the lowest risk coefficient of 0.042, and the shortest training time of 12.68 s due to its relevant algorithms for directly tackling categorical features. As for its interpretability, the most important features are determined, and information on these features’ impacts and interactions is obtained to improve the reliability of and promote practical engineering applications for the ML models. The proposed three-stage approach can provide a reference for the overall ML implementation process on raw datasets for similar problems.

Keywords: building damage assessment; earthquake disaster; categorical feature; machine learning; LightGBM; interpretability method; Shapley additive explanation (search for similar items in EconPapers)
JEL-codes: O13 Q Q0 Q2 Q3 Q5 Q56 (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2071-1050/15/18/13847/pdf (application/pdf)
https://www.mdpi.com/2071-1050/15/18/13847/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jsusta:v:15:y:2023:i:18:p:13847-:d:1242027

Access Statistics for this article

Sustainability is currently edited by Ms. Alexandra Wu

More articles in Sustainability from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().