Optimizing machine learning for water safety: A comparative analysis with dimensionality reduction and classifier performance in potability prediction

Chatterjee, Debashis; Ghosh, Prithwish; Banerjee, Amlan; Das, Shiladri Shekhar

Optimizing machine learning for water safety: A comparative analysis with dimensionality reduction and classifier performance in potability prediction

Debashis Chatterjee, Prithwish Ghosh, Amlan Banerjee and Shiladri Shekhar Das

PLOS Water, 2024, vol. 3, issue 8, 1-25

Abstract: In this study, we investigated the effectiveness of machine learning techniques in predicting water potability based on water quality attributes. Initially, we applied seven classification-based methods directly to the original dataset, yielding varying accuracy scores. Notably, the Support Vector Machine (SVM) achieved the highest accuracy of 69%, while other methods such as XGBoost, k-Nearest Neighbors, Gaussian Naive Bayes, and Random Forest demonstrated competitive performance with scores ranging from 62% to 68%. Subsequently, we employed Principal Component Analysis (PCA) to reduce the dataset’s dimensionality to six principal components, followed by reapplication of the machine learning techniques. The results showed an increase in accuracy across all classifiers, increasing to nearly 100%. This study provides insights into the impact of dimensionality reduction on predictive accuracy and underscores the importance of selecting appropriate techniques for water potability prediction.

Date: 2024
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/water/article?id=10.1371/journal.pwat.0000259 (text/html)
https://journals.plos.org/water/article/file?id=10 ... 00259&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pwat00:0000259

DOI: 10.1371/journal.pwat.0000259

Access Statistics for this article

More articles in PLOS Water from Public Library of Science
Bibliographic data for series maintained by water ().