Rethinking Evaluation Metrics in Hydrological Deep Learning: Insights from Torrent Flow Velocity Prediction

Chen, Walter; Nguyen, Kieu Anh; Lin, Bor-Shiun

Rethinking Evaluation Metrics in Hydrological Deep Learning: Insights from Torrent Flow Velocity Prediction

Walter Chen (), Kieu Anh Nguyen and Bor-Shiun Lin
Additional contact information
Walter Chen: Department of Civil Engineering, National Taipei University of Technology, Taipei 10608, Taiwan
Kieu Anh Nguyen: Department of Civil Engineering, National Taipei University of Technology, Taipei 10608, Taiwan
Bor-Shiun Lin: Ultron Technology Engineering Company, Taipei 11072, Taiwan

Sustainability, 2025, vol. 17, issue 19, 1-16

Abstract: Accurate estimation of flow velocities in torrents and steep rivers is essential for flood risk assessment, sediment transport analysis, and the sustainable management of water resources. While deep learning models are increasingly applied to such tasks, their evaluation often depends on statistical metrics that may yield conflicting interpretations. The objective of this study is to clarify how different evaluation metrics influence the interpretation of hydrological deep learning models. We analyze two models of flow velocity prediction in a torrential creek in Taiwan. Although the models differ in architecture, the critical distinction lies in the datasets used: the first model was trained on May–June data, whereas the second model incorporated May–August data. Four performance metrics were examined—root mean square error (RMSE), Nash–Sutcliffe efficiency (NSE), Willmott’s index of agreement ( d ), and mean absolute percentage error (MAPE). Quantitatively, the first model attained RMSE = 0.0471 m/s, NSE = 0.519, and MAPE = 7.78%, whereas the second model produced RMSE = 0.0572 m/s, NSE = 0.678, and MAPE = 11.56%. The results reveal a paradox. The first model achieved lower RMSE and MAPE, indicating predictions closer to the observed values, but its NSE fell below the 0.65 threshold often cited by reviewers as grounds for rejection. In contrast, the second model exceeded this NSE threshold and would likely be considered acceptable, despite producing larger errors in absolute terms. This paradox highlights the novelty of the study: model evaluation outcomes can be driven more by data variability and the choice of metric than by model architecture. This underscores the risk of misinterpretation if a single metric is used in isolation. For sustainability-oriented hydrology, robust assessment requires reporting multiple metrics and interpreting them in a balanced manner to support disaster risk reduction, resilient water management, and climate adaptation.

Keywords: torrent flow velocity; deep learning; three-dimensional convolutional neural network; convolutional neural network with long short-term memory; root mean square error; Nash–Sutcliffe efficiency; Willmott’s index of agreement; mean absolute percentage error (search for similar items in EconPapers)
JEL-codes: O13 Q Q0 Q2 Q3 Q5 Q56 (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2071-1050/17/19/8658/pdf (application/pdf)
https://www.mdpi.com/2071-1050/17/19/8658/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jsusta:v:17:y:2025:i:19:p:8658-:d:1758793

Access Statistics for this article

Sustainability is currently edited by Ms. Alexandra Wu

More articles in Sustainability from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().