Bayesian estimation of community size and overlap from random subsamples

Johnson, Erik K; Larremore, Daniel B

Bayesian estimation of community size and overlap from random subsamples

Erik K Johnson and Daniel B Larremore

PLOS Computational Biology, 2022, vol. 18, issue 9, 1-16

Abstract: Counting the number of species, items, or genes that are shared between two groups, sets, or communities is a simple calculation when sampling is complete. However, when only partial samples are available, quantifying the overlap between two communities becomes an estimation problem. Furthermore, to calculate normalized measures of β-diversity, such as the Jaccard and Sorenson-Dice indices, one must also estimate the total sizes of the communities being compared. Previous efforts to address these problems have assumed knowledge of total community sizes and then used Bayesian methods to produce unbiased estimates with quantified uncertainty. Here, we address communities of unknown size and show that this produces systematically better estimates—both in terms of central estimates and quantification of uncertainty in those estimates. We further show how to use species, item, or gene count data to refine estimates of community size in a Bayesian joint model of community size and overlap.Author summary: When two sets of species, genes, or items have been completely enumerated, quantifying the overlap between the sets is as simple as comparing their contents. However, in many applications, only random samples from the two sets are available, forcing the problem of overlap quantification into the realm of inference. Using a Bayesian inference approach, this paper shows how one can use random samples from two sets to simultaneously estimate the total size of each set, as well as the overlap between them. Rather than learning from the presence and absence of each species, gene, or item alone, as in prior work, this method utilizes the total number of samples drawn from each set to aid in the inference process. By drawing on this additional information, overlap estimates are more confident and accurate. These methods not only allow inference from existing data, but also enable prospective sample size calculations via simulation.

Date: 2022
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010451 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 10451&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1010451

DOI: 10.1371/journal.pcbi.1010451

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().