EconPapers    
Economics at your fingertips  
 

Unsupervised detection and fitness estimation of emerging SARS-CoV-2 variants: Application to wastewater samples (ANRS0160)

Alexandra Lefebvre, Vincent Maréchal, Arnaud Gloaguen, The Obépine Consortium, Amaury Lambert and Yvon Maday

PLOS Computational Biology, 2025, vol. 21, issue 12, 1-22

Abstract: Repeated waves of emerging variants during the SARS-CoV-2 pandemics have highlighted the urge of collecting longitudinal genomic data and developing statistical methods based on time series analyses for detecting new threatening lineages and estimating their fitness early in time. Most models study the evolution of the prevalence of particular lineages over time and require a prior classification of sequences into lineages which is prone to induce delays and biases. More recently, several authors studied the evolution of the prevalence of mutations over time with alternative clustering approaches, avoiding specific lineage classification. Most existing methods are either non parametric or unsuited to pooled data characterizing, for instance, wastewater samples. The analysis of wastewater samples has recently been pointed out as a valuable complementary approach to clinical sample analysis, however the pooled nature of the data involves specific statistical challenges. In this context, we propose an alternative unsupervised method for clustering mutations according to their frequency trajectory over time and estimating group fitness from time series of pooled mutation prevalence data. Our model is a mixture of observed count data and latent group assignment and we use the expectation-maximization algorithm for model selection and parameter estimation. The application of our method to time series of SARS-CoV-2 sequencing data collected from wastewater treatment plants in France from October 2020 to April 2021 shows its ability to agnostically group mutations in a consistent way with lineages B.1.160, Alpha, B.1.177, Beta, and with selection coefficient estimates per group in coherence with the viral dynamics in France reported by Nextstrain. Moreover, our method detected the Alpha variant as threatening as early as supervised methods (which track specific mutations over time) with the noticeable difference that, since unsupervised, it does not require any prior information on the set of mutations.Author summary: The SARS-CoV-2 pandemics has been characterized by successive waves of emerging variants replacing previously dominant ones. A variant is characterized by a combination of mutations, with some mutations possibly shared among variant relatives. The early detection of emerging variants is of great importance in order to adapt public health responses to viral evolution. Wastewater surveillance has been highlighted as a valuable complementary approach to clinical sample analysis mostly because it is representative of the viral circulation at a population level. Indeed, all infected individuals, wether symptomatic or not, contribute to wastewater samples. Wastewater surveillance however is subject to some statistical challenges as the viral genetic material is highly fragmented, incomplete and comes from multiple individuals. In this work we propose a method, suited for wastewater samples, grouping viral mutations according to their frequency trajectory through time in an agnostic manner and we detect threatening variants without prior knowledge on their characteristic mutations as early as methods targeting known specific mutations.

Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1013749 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 13749&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1013749

DOI: 10.1371/journal.pcbi.1013749

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2025-12-07
Handle: RePEc:plo:pcbi00:1013749