Early detection of emerging SARS-CoV-2 Variants from wastewater through genome sequencing and machine learning
Xiaowei Zhuang,
Vo Van,
Michael A. Moshi,
Ketan Dhede,
Nabih Ghani,
Shahraiz Akbar,
Ching-Lan Chang,
Angelia K. Young,
Erin Buttery,
William Bendik,
Hong Zhang,
Salman Afzal,
Duane Moser,
Dietmar Cordes,
Cassius Lockett (),
Daniel Gerrity (),
Horng-Yuan Kan () and
Edwin C. Oh ()
Additional contact information
Xiaowei Zhuang: University of Nevada Las Vegas
Vo Van: University of Nevada Las Vegas
Michael A. Moshi: University of Nevada Las Vegas
Ketan Dhede: University of Nevada Las Vegas
Nabih Ghani: University of Nevada Las Vegas
Shahraiz Akbar: University of Nevada Las Vegas
Ching-Lan Chang: University of Nevada Las Vegas
Angelia K. Young: Southern Nevada Health District
Erin Buttery: Southern Nevada Health District
William Bendik: Southern Nevada Health District
Hong Zhang: Southern Nevada Health District
Salman Afzal: Southern Nevada Health District
Duane Moser: Desert Research Institute
Dietmar Cordes: Cleveland Clinic Lou Ruvo Center for Brain Health
Cassius Lockett: Southern Nevada Health District
Daniel Gerrity: P.O. Box 99954
Horng-Yuan Kan: Southern Nevada Health District
Edwin C. Oh: University of Nevada Las Vegas
Nature Communications, 2025, vol. 16, issue 1, 1-12
Abstract:
Abstract Genome sequencing from wastewater enables accurate and cost-effective identification of SARS-CoV-2 variants. However, existing computational pipelines have limitations in detecting emerging variants not yet characterized in humans. Here, we present an unsupervised learning approach that clusters co-varying and time-evolving mutation patterns to identify SARS-CoV-2 variants. To build our model, we sequence 3659 wastewater samples collected over two years from urban and rural locations in Southern Nevada. We then develop a multivariate independent component analysis (ICA)-based pipeline to transform mutation frequencies into independent sources. These data-driven time-evolving and co-varying sources are compared to 8810 SARS-CoV-2 clinical genomes from Nevadans. Our method accurately detects the Delta variant in late 2021, Omicron variants in 2022, and emerging recombinant XBB variants in 2023. Our approach also reveals the spatial and temporal dynamics of variants in both urban and rural regions; achieves earlier detection of most variants compared to other computational tools; and uncovers unique co-varying mutation patterns not associated with any known variant. The multivariate nature of our pipeline boosts statistical power and supports accurate early detection of SARS-CoV-2 variants. This feature offers a unique opportunity to detect emerging variants and pathogens, even in the absence of clinical testing.
Date: 2025
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.nature.com/articles/s41467-025-61280-5 Abstract (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-61280-5
Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/
DOI: 10.1038/s41467-025-61280-5
Access Statistics for this article
Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie
More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().