EconPapers    
Economics at your fingertips  
 

Predicting oil contamination in water using machine learning on microbial compositions

Tong Gao, Isaac Bigcraft, Stephen Techtmann and Issei Nakamura

PLOS ONE, 2026, vol. 21, issue 3, 1-16

Abstract: We present a compact and generative machine-learning framework that predicts oil contamination based on microbial community compositions from experimental samples. Our method combines dimensionality reduction with data augmentation and generative modeling to address high-dimensional, non-linear, and sparse microbial data. To reduce the 503-dimensional bacterial composition dataset, we compared three dimensionality reduction techniques: feature importance from random forest, principal component analysis (PCA), and t-distributed stochastic neighbor embedding (t-SNE). Feature importance outperformed PCA and t-SNE, improving predictive performance and identifying microbial species most strongly correlated with oil contamination. To mitigate data scarcity, we augmented the training data using an augmented data neural network (ADNN) with noise injection. Samples generated by a variational autoencoder (VAE) were used as controlled perturbations to probe model robustness during stress testing. Using the top 3–10 bacterial features, our model achieved an R² value of up to 0.99 in both training and stress testing for predicting oil contamination from microbial data. In a bottle-level hold-out evaluation (22 splits at an 80/20 bottle ratio), performance on held-out bottles was lower and variable (mean test R² = −0.150), indicating limited generalization within this cohort. These results should be interpreted as a feasibility demonstration requiring validation on larger independent datasets.

Date: 2026
References: Add references at CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0344571 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 44571&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0344571

DOI: 10.1371/journal.pone.0344571

Access Statistics for this article

More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().

 
Page updated 2026-03-22
Handle: RePEc:plo:pone00:0344571