Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis

Engelhardt, Barbara E; Stephens, Matthew

Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis

Barbara E Engelhardt and Matthew Stephens

PLOS Genetics, 2010, vol. 6, issue 9, 1-12

Abstract: We consider the statistical analysis of population structure using genetic data. We show how the two most widely used approaches to modeling population structure, admixture-based models and principal components analysis (PCA), can be viewed within a single unifying framework of matrix factorization. Specifically, they can both be interpreted as approximating an observed genotype matrix by a product of two lower-rank matrices, but with different constraints or prior distributions on these lower-rank matrices. This opens the door to a large range of possible approaches to analyzing population structure, by considering other constraints or priors. In this paper, we introduce one such novel approach, based on sparse factor analysis (SFA). We investigate the effects of the different types of constraint in several real and simulated data sets. We find that SFA produces similar results to admixture-based models when the samples are descended from a few well-differentiated ancestral populations and can recapitulate the results of PCA when the population structure is more “continuous,” as in isolation-by-distance models.Author Summary: Two different approaches have become widely used in the analysis of population structure: admixture-based models and principal components analysis (PCA). In admixture-based models each individual is assumed to have inherited some proportion of its ancestry from one of several distinct populations. PCA projects the individuals into a low-dimensional subspace. On the face of it, these methods seem to have little in common. Here we show how in fact both of these methods can be viewed within a single unifying framework. This viewpoint should help practitioners to better interpret and contrast the results from these methods in real data applications. It also provides a springboard to the development of novel approaches to this problem. We introduce one such novel approach, based on sparse factor analysis, which has elements in common with both admixture-based models and PCA. As we illustrate here, in some settings sparse factor analysis may provide more interpretable results than either admixture-based models or PCA.

Date: 2010
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (10)

Downloads: (external link)
https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1001117 (text/html)
https://journals.plos.org/plosgenetics/article?id= ... 01117&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pgen00:1001117

DOI: 10.1371/journal.pgen.1001117

Access Statistics for this article

More articles in PLOS Genetics from Public Library of Science
Bibliographic data for series maintained by plosgenetics ().