Ensemble Decision Tree Models Using RUSBoost for Estimating Risk of Iron Failure in Drinking Water Distribution Systems
S. R. Mounce (),
K. Ellis,
J. M. Edwards,
V. L. Speight,
N. Jakomis and
J. B. Boxall
Additional contact information
S. R. Mounce: University of Sheffield
K. Ellis: University of Sheffield
J. M. Edwards: University of Sheffield
V. L. Speight: University of Sheffield
N. Jakomis: Dŵr Cymru Welsh Water
J. B. Boxall: University of Sheffield
Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), 2017, vol. 31, issue 5, No 10, 1575-1589
Abstract:
Abstract Safe, trusted drinking water is fundamental to society. Discolouration is a key aesthetic indicator visible to customers. Investigations to understand discolouration and iron failures in water supply systems require assessment of large quantities of disparate, inconsistent, multidimensional data from multiple corporate systems. A comprehensive data matrix was assembled for a seven year period across the whole of a UK water company (serving three million people). From this a novel data driven tool for assessment of iron risk was developed based on a yearly update and ranking procedure, for a subset of the best quality data. To avoid a ‘black box’ output, and provide an element of explanatory (human readable) interpretation, classification decision trees were utilised. Due to the very limited number of iron failures, results from many weak learners were melded into one high-quality ensemble predictor using the RUSBoost algorithm which is designed for class imbalance. Results, exploring simplicity vs predictive power, indicate enough discrimination between variable relationships in the matrix to produce ensemble decision tree classification models with good accuracy for iron failure estimation at District Management Area (DMA) scale. Two model variants were explored: ‘Nowcast’ (situation at end of calendar year) and ‘Futurecast’ (predict end of next year situation from this year’s data). The Nowcast 2014 model achieved 100% True Positive Rate (TPR) and 95.3% True Negative Rate (TNR), with 3.3% of DMAs classified High Risk for un-sampled instances. The Futurecast 2014 achieved 60.5% TPR and 75.9% TNR, with 25.7% of DMAs classified High Risk for un-sampled instances. The output can be used to focus preventive measures to improve iron compliance.
Keywords: Water distribution systems; Water quality; Iron; Machine learning; Ensemble decision trees; RUSBoost (search for similar items in EconPapers)
Date: 2017
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://link.springer.com/10.1007/s11269-017-1595-8 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:waterr:v:31:y:2017:i:5:d:10.1007_s11269-017-1595-8
Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11269
DOI: 10.1007/s11269-017-1595-8
Access Statistics for this article
Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA) is currently edited by G. Tsakiris
More articles in Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA) from Springer, European Water Resources Association (EWRA)
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().