Bayesian Inference of Pathogen Phylogeography using the Structured Coalescent Model

Roberts, Ian; Everitt, Richard G; Koskela, Jere; Didelot, Xavier

Bayesian Inference of Pathogen Phylogeography using the Structured Coalescent Model

Ian Roberts, Richard G Everitt, Jere Koskela and Xavier Didelot

PLOS Computational Biology, 2025, vol. 21, issue 4, 1-37

Abstract: Over the past decade, pathogen genome sequencing has become well established as a powerful approach to study infectious disease epidemiology. In particular, when multiple genomes are available from several geographical locations, comparing them is informative about the relative size of the local pathogen populations as well as past migration rates and events between locations. The structured coalescent model has a long history of being used as the underlying process for such phylogeographic analysis. However, the computational cost of using this model does not scale well to the large number of genomes frequently analysed in pathogen genomic epidemiology studies. Several approximations of the structured coalescent model have been proposed, but their effects are difficult to predict. Here we show how the exact structured coalescent model can be used to analyse a precomputed dated phylogeny, in order to perform Bayesian inference on the past migration history, the effective population sizes in each location, and the directed migration rates from any location to another. We describe an efficient reversible jump Markov Chain Monte Carlo scheme which is implemented in a new R package StructCoalescent. We use simulations to demonstrate the scalability and correctness of our method and to compare it with existing software. We also applied our new method to several state-of-the-art datasets on the population structure of real pathogens to showcase the relevance of our method to current data scales and research questions.Author summary: A virus may be present in several countries, but typically most transmission events will take place within each country, with only a relatively small number of transmission events happening from one country to another. Such structure in the pathogen population has an effect on the similarity between genomes. If the geographical structure is strong then genomes collected from the same location will be more similar on average than genomes collected from different locations. Conversely, we can reverse this principle to determine what the relationships between genomes (that we observe) implies about the pathogen population structure (that we do not observe but want to learn about). Here we present a new method to perform this task. We apply it to several simulated and real sets of pathogen genomes to reveal their underlying population structure. Knowing about pathogen population structures has important consequences for understanding the evolution and epidemiology of infectious disease pathogens, and therefore to inform the public health policies that can limit their burden.

Date: 2025
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012995 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 12995&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1012995

DOI: 10.1371/journal.pcbi.1012995

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().