Quantifying and Mitigating the Effect of Preferential Sampling on Phylodynamic Inference

Karcher, Michael D; Palacios, Julia A; Bedford, Trevor; Suchard, Marc A; Minin, Vladimir N

Quantifying and Mitigating the Effect of Preferential Sampling on Phylodynamic Inference

Michael D Karcher, Julia A Palacios, Trevor Bedford, Marc A Suchard and Vladimir N Minin

PLOS Computational Biology, 2016, vol. 12, issue 3, 1-19

Abstract: Phylodynamics seeks to estimate effective population size fluctuations from molecular sequences of individuals sampled from a population of interest. One way to accomplish this task formulates an observed sequence data likelihood exploiting a coalescent model for the sampled individuals’ genealogy and then integrating over all possible genealogies via Monte Carlo or, less efficiently, by conditioning on one genealogy estimated from the sequence data. However, when analyzing sequences sampled serially through time, current methods implicitly assume either that sampling times are fixed deterministically by the data collection protocol or that their distribution does not depend on the size of the population. Through simulation, we first show that, when sampling times do probabilistically depend on effective population size, estimation methods may be systematically biased. To correct for this deficiency, we propose a new model that explicitly accounts for preferential sampling by modeling the sampling times as an inhomogeneous Poisson process dependent on effective population size. We demonstrate that in the presence of preferential sampling our new model not only reduces bias, but also improves estimation precision. Finally, we compare the performance of the currently used phylodynamic methods with our proposed model through clinically-relevant, seasonal human influenza examples.Author Summary: Phylodynamics seeks to estimate changes in population size from genetic data sampled from individuals across a particular population. One approach to accomplish this task uses a model called the coalescent, which relates the shape of the individuals’ shared ancestral tree to genetic diversity, which is in turn related to population size. However, when analyzing genetic data sampled at different times, current techniques assume that sampling times are fixed ahead of time or are distributed randomly without any relation to the size of the population. Through simulation, we show that when sampling times are related to population size, a situation referred to as preferential sampling, those estimation methods may be systematically biased. To fix this problem, we propose a new model that explicitly accounts for and models the preferential sampling. We show that in the presence of preferential sampling our new technique not only fixes the bias, but also has improved precision in its population size estimates. We also compare the performance of the old and new techniques on several real-world seasonal human influenza examples.

Date: 2016
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (3)

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004789 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 04789&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1004789

DOI: 10.1371/journal.pcbi.1004789

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().