A Machine Learning Predictive Model to Detect Water Quality and Pollution

Xu, Xiaoting; Lai, Tin; Jahan, Sayka; Farid, Farnaz; Bello, Abubakar

A Machine Learning Predictive Model to Detect Water Quality and Pollution

Xiaoting Xu, Tin Lai, Sayka Jahan, Farnaz Farid () and Abubakar Bello
Additional contact information
Xiaoting Xu: School of Computer Science, The University of Sydney, Camperdown, NSW 2006, Australia
Tin Lai: School of Computer Science, The University of Sydney, Camperdown, NSW 2006, Australia
Sayka Jahan: Department of Environmental Sciences, Macquarie University, Sydney, NSW 2109, Australia
Farnaz Farid: School of Social Sciences, Western Sydney University, Penrith, NSW 2751, Australia
Abubakar Bello: School of Social Sciences, Western Sydney University, Penrith, NSW 2751, Australia

Future Internet, 2022, vol. 14, issue 11, 1-14

Abstract: The increasing prevalence of marine pollution during the past few decades motivated recent research to help ease the situation. Typical water quality assessment requires continuous monitoring of water and sediments at remote locations with labour-intensive laboratory tests to determine the degree of pollution. We propose an automated water quality assessment framework where we formalise a predictive model using machine learning to infer the water quality and level of pollution using collected water and sediments samples. Firstly, due to the sparsity of sample collection locations, the amount of sediment samples of water is limited, and the dataset is incomplete. Therefore, after an extensive investigation on various data imputation methods’ performance in water and sediment datasets with different missing data rates, we chose the best imputation method to process the missing data. Afterwards, the water sediment sample will be tagged as one of four levels of pollution based on some guidelines and then the machine learning model will use a specific technique named classification to find the relationship between the data and the final result. After that, the result of prediction can be compared to the real result so that it can be checked whether the model is good and whether the prediction is accurate. Finally, the research gave improvement advice based on the result obtained from the model building part. Empirically, we show that our best model archives an accuracy of 75% after accounting for 57% of missing data. Experimentally, we show that our model would assist in automatically assessing water quality screening based on possibly incomplete real-world data.

Keywords: water pollution; artificial intelligence; marine pollution; machine learning model; deep learning model; data imputation (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2022
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/1999-5903/14/11/324/pdf (application/pdf)
https://www.mdpi.com/1999-5903/14/11/324/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:14:y:2022:i:11:p:324-:d:966701

Access Statistics for this article

Future Internet is currently edited by Ms. Grace You

More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().