Prediction of Cavity Length Using an Interpretable Ensemble Learning Approach

Guo, Ganggui; Li, Shanshan; Liu, Yakun; Cao, Ze; Deng, Yangyu

Prediction of Cavity Length Using an Interpretable Ensemble Learning Approach

Ganggui Guo, Shanshan Li, Yakun Liu (), Ze Cao and Yangyu Deng
Additional contact information
Ganggui Guo: School of Hydraulic Engineering, Faculty of Infrastructure Engineering, Dalian University of Technology, Dalian 116024, China
Shanshan Li: Conservancy and Hydropower Engineering, Xi’an University of Technology, Xian 710048, China
Yakun Liu: School of Hydraulic Engineering, Faculty of Infrastructure Engineering, Dalian University of Technology, Dalian 116024, China
Ze Cao: School of Hydraulic Engineering, Faculty of Infrastructure Engineering, Dalian University of Technology, Dalian 116024, China
Yangyu Deng: School of Hydraulic Engineering, Faculty of Infrastructure Engineering, Dalian University of Technology, Dalian 116024, China

IJERPH, 2022, vol. 20, issue 1, 1-14

Abstract: The cavity length, which is a vital index in aeration and corrosion reduction engineering, is affected by many factors and is challenging to calculate. In this study, 10-fold cross-validation was performed to select the optimal input configuration. Additionally, the hyperparameters of three ensemble learning models—random forest (RF), gradient boosting decision tree (GBDT), and extreme gradient boosting tree (XGBOOST)—were fine-tuned by the Bayesian optimization (BO) algorithm to improve the prediction accuracy and compare the five empirical methods. The XGBOOST method was observed to present the highest prediction accuracy. Further interpretability analysis carried out using the Sobol method demonstrated its ability to reasonably capture the varying relative significance of different input features under different flow conditions. The Sobol sensitivity analysis also observed two patterns of extracting information from the input features in ML models: (1) the main effect of individual features in ensemble learning and (2) the interactive effect between each feature in SVR. From the results, the models obtaining individual information both predict the cavity length more accurately than that using interactive information. Subsequently, the XGBOOST captures more correct information from features, which leads to the varied Sobol index in accordance with outside phenomena; meanwhile, the predicted results fit the experimental points best.

Keywords: cavity length; optimization algorithm; interpretable model; ensemble learning model (search for similar items in EconPapers)
JEL-codes: I I1 I3 Q Q5 (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/1660-4601/20/1/702/pdf (application/pdf)
https://www.mdpi.com/1660-4601/20/1/702/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jijerp:v:20:y:2022:i:1:p:702-:d:1020647

Access Statistics for this article

IJERPH is currently edited by Ms. Jenna Liu

More articles in IJERPH from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().