EconPapers    
Economics at your fingertips  
 

The site frequency spectrum of dispensable genes

Franz Baumdicker

Theoretical Population Biology, 2015, vol. 100, issue C, 13-25

Abstract: The differences between DNA-sequences within a population are the basis to infer the ancestral relationship of the individuals. Within the classical infinitely many sites model, it is possible to estimate the mutation rate based on the site frequency spectrum, which is comprised by the numbers C1,…,Cn−1 where n is the sample size and Cs is the number of site mutations (Single Nucleotide Polymorphisms, SNPs) which are seen in s genomes. Classical results can be used to compare the observed site frequency spectrum with its neutral expectation, E[Cs]=θ2/s, where θ2 is the scaled site mutation rate. In this paper, we will relax the assumption of the infinitely many sites model that all individuals only carry homologous genetic material. Especially, it is today well-known that bacterial genomes have the ability to gain and lose genes, such that every single genome is a mosaic of genes, and genes are present and absent in a random fashion, giving rise to the dispensable genome. While this presence and absence has been modeled under neutral evolution within the infinitely many genes model in Baumdicker et al. (2010), we link presence and absence of genes with the numbers of site mutations seen within each gene. In this work we derive a formula for the expectation of the joint gene and site frequency spectrum, denoted by Gk,s, the number of mutated sites occurring in exactly s gene sequences, while the corresponding gene is present in exactly k individuals. We show that standard estimators of θ2 for dispensable genes are biased and that the site frequency spectrum for dispensable genes differs from the classical result.

Keywords: Population genetics; Site frequency spectrum; Dispensable gene; Pangenome; Tajima’s D (search for similar items in EconPapers)
Date: 2015
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0040580914000975
Full text for ScienceDirect subscribers only

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:thpobi:v:100:y:2015:i:c:p:13-25

DOI: 10.1016/j.tpb.2014.12.001

Access Statistics for this article

Theoretical Population Biology is currently edited by Jeremy Van Cleve

More articles in Theoretical Population Biology from Elsevier
Bibliographic data for series maintained by Catherine Liu ().

 
Page updated 2025-03-19
Handle: RePEc:eee:thpobi:v:100:y:2015:i:c:p:13-25