EconPapers    
Economics at your fingertips  
 

PRIMAL: Fast and Accurate Pedigree-based Imputation from Sequence Data in a Founder Population

Oren E Livne, Lide Han, Gorka Alkorta-Aranburu, William Wentworth-Sheilds, Mark Abney, Carole Ober and Dan L Nicolae

PLOS Computational Biology, 2015, vol. 11, issue 3, 1-14

Abstract: Founder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (PedigRee IMputation ALgorithm), a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL incorporates both existing and original ideas, such as a novel indexing strategy of Identity-By-Descent (IBD) segments based on clique graphs. We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs), from 98 whole genome sequences. Using a combination of pedigree-based and LD-based imputation, we were able to assign 87% of genotypes with >99% accuracy over the full range of allele frequencies. Using the IBD cliques we were also able to infer the parental origin of 83% of alleles, and genotypes of deceased recent ancestors for whom no genotype information was available. This imputed data set will enable us to better study the relative contribution of rare and common variants on human phenotypes, as well as parental origin effect of disease risk alleles in >1,000 individuals at minimal cost.Author Summary: The recent availability of whole genome and whole exome sequencing allows genetic studies of human diseases and traits at an unprecedented resolution, although their cost limits the size of the studied sample. To overcome this limitation and design cost-efficient studies, we developed a two step method: sequencing of relatively few members of a well-characterized founder population followed by pedigree-based whole genome imputation of many other individuals with genome-wide genotype data. We show that by sequencing only 98 Hutterites, we can impute 7 million variants in an additional 1,317 Hutterites with >99% accuracy and an average call rate of 87%. Furthermore, parental origin was assigned to 83% of the alleles. Such studies in the Hutterites and other founder populations should yield new insights into the genetic architecture of common diseases, gene expression traits, and clinically relevant biomarkers of disease, and ultimately provide outstanding opportunities for personalized medicine in these well-characterized populations.

Date: 2015
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (3)

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004139 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 04139&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1004139

DOI: 10.1371/journal.pcbi.1004139

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2025-03-19
Handle: RePEc:plo:pcbi00:1004139