Optimizing machine learning for water safety: A comparative analysis with dimensionality reduction and classifier performance in potability prediction
Debashis Chatterjee,
Prithwish Ghosh,
Amlan Banerjee and
Shiladri Shekhar Das
PLOS Water, 2024, vol. 3, issue 8, 1-25
Abstract:
In this study, we investigated the effectiveness of machine learning techniques in predicting water potability based on water quality attributes. Initially, we applied seven classification-based methods directly to the original dataset, yielding varying accuracy scores. Notably, the Support Vector Machine (SVM) achieved the highest accuracy of 69%, while other methods such as XGBoost, k-Nearest Neighbors, Gaussian Naive Bayes, and Random Forest demonstrated competitive performance with scores ranging from 62% to 68%. Subsequently, we employed Principal Component Analysis (PCA) to reduce the dataset’s dimensionality to six principal components, followed by reapplication of the machine learning techniques. The results showed an increase in accuracy across all classifiers, increasing to nearly 100%. This study provides insights into the impact of dimensionality reduction on predictive accuracy and underscores the importance of selecting appropriate techniques for water potability prediction.
Date: 2024
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/water/article?id=10.1371/journal.pwat.0000259 (text/html)
https://journals.plos.org/water/article/file?id=10 ... 00259&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pwat00:0000259
DOI: 10.1371/journal.pwat.0000259
Access Statistics for this article
More articles in PLOS Water from Public Library of Science
Bibliographic data for series maintained by water ().