Bayesian reconstruction of transmission trees from genetic sequences and uncertain infection times
Montazeri Hesam (),
Mozaffarilegha Mozhgan (),
Little Susan (),
Beerenwinkel Niko () and
DeGruttola Victor ()
Additional contact information
Mozaffarilegha Mozhgan: Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Ghods 37, 1417614335Tehran, Iran
Little Susan: Department of Medicine, University of California San Diego, 220 Dickinson St, San Diego, CA92103-8208, USA
Beerenwinkel Niko: Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, 4058Basel, Switzerland
DeGruttola Victor: Harvard TH Chan School of Public Health, 665 Huntington Ave, Boston, MA02115, USA
Statistical Applications in Genetics and Molecular Biology, 2020, vol. 19, issue 4-6, 13
Genetic sequence data of pathogens are increasingly used to investigate transmission dynamics in both endemic diseases and disease outbreaks. Such research can aid in the development of appropriate interventions and in the design of studies to evaluate them. Several computational methods have been proposed to infer transmission chains from sequence data; however, existing methods do not generally reliably reconstruct transmission trees because genetic sequence data or inferred phylogenetic trees from such data contain insufficient information for accurate estimation of transmission chains. Here, we show by simulation studies that incorporating infection times, even when they are uncertain, can greatly improve the accuracy of reconstruction of transmission trees. To achieve this improvement, we propose a Bayesian inference methods using Markov chain Monte Carlo that directly draws samples from the space of transmission trees under the assumption of complete sampling of the outbreak. The likelihood of each transmission tree is computed by a phylogenetic model by treating its internal nodes as transmission events. By a simulation study, we demonstrate that accuracy of the reconstructed transmission trees depends mainly on the amount of information available on times of infection; we show superiority of the proposed method to two alternative approaches when infection times are known up to specified degrees of certainty. In addition, we illustrate the use of a multiple imputation framework to study features of epidemic dynamics, such as the relationship between characteristics of nodes and average number of outbound edges or inbound edges, signifying possible transmission events from and to nodes. We apply the proposed method to a transmission cluster in San Diego and to a dataset from the 2014 Sierra Leone Ebola virus outbreak and investigate the impact of biological, behavioral, and demographic factors.
Keywords: Ebola; HIV; Monte Carlo Markov chain; phylogenetic analysis; transmission trees (search for similar items in EconPapers)
References: Add references at CitEc
Citations: Track citations by RSS feed
Downloads: (external link)
For access to full text, subscription to the journal or payment for the individual article is required.
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:19:y:2020:i:4-6:p:13:n:1
Ordering information: This journal article can be ordered from
Access Statistics for this article
Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf
More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().