EconPapers    
Economics at your fingertips  
 

Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees

Tom M W Nye, Xiaoxian Tang, Grady Weyenberg and Ruriko Yoshida

Biometrika, 2017, vol. 104, issue 4, 901-922

Abstract: SummaryEvolutionary relationships are represented by phylogenetic trees, and a phylogenetic analysis of gene sequences typically produces a collection of these trees, one for each gene in the analysis. Analysis of samples of trees is difficult due to the multi-dimensionality of the space of possible trees. In Euclidean spaces, principal component analysis is a popular method of reducing high-dimensional data to a low-dimensional representation that preserves much of the sample’s structure. However, the space of all phylogenetic trees on a fixed set of species does not form a Euclidean vector space, and methods adapted to tree space are needed. Previous work introduced the notion of a principal geodesic in this space, analogous to the first principal component. Here we propose a geometric object for tree space similar to the $k$th principal component in Euclidean space: the locus of the weighted Fréchet mean of $k+1$ vertex trees when the weights vary over the $k$-simplex. We establish some basic properties of these objects, in particular showing that they have dimension $k$, and propose algorithms for projection onto these surfaces and for finding the principal locus associated with a sample of trees. Simulation studies demonstrate that these algorithms perform well, and analyses of two datasets, containing Apicomplexa and African coelacanth genomes respectively, reveal important structure from the second principal components.

Keywords: Fréchet mean; Phylogenetic tree; Principal component analysis; Tree space (search for similar items in EconPapers)
Date: 2017
References: Add references at CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
http://hdl.handle.net/10.1093/biomet/asx047 (application/pdf)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:oup:biomet:v:104:y:2017:i:4:p:901-922.

Ordering information: This journal article can be ordered from
https://academic.oup.com/journals

Access Statistics for this article

Biometrika is currently edited by Paul Fearnhead

More articles in Biometrika from Biometrika Trust Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, UK.
Bibliographic data for series maintained by Oxford University Press ().

 
Page updated 2025-03-19
Handle: RePEc:oup:biomet:v:104:y:2017:i:4:p:901-922.