EconPapers    
Economics at your fingertips  
 

PhyloTune: An efficient method to accelerate phylogenetic updates using a pretrained DNA language model

Danruo Deng, Wuqin Xu (), Bian Wu, Hans Peter Comes, Yu Feng, Pan Li (), Jinfang Zheng (), Guangyong Chen () and Pheng-Ann Heng
Additional contact information
Danruo Deng: The Chinese University of Hong Kong
Wuqin Xu: Kechuang Avenue
Bian Wu: Kechuang Avenue
Hans Peter Comes: Salzburg University
Yu Feng: Chinese Academy of Sciences
Pan Li: Zhejiang University
Jinfang Zheng: Kechuang Avenue
Guangyong Chen: Hangzhou Institute of Medicine Chinese Academy of Sciences
Pheng-Ann Heng: The Chinese University of Hong Kong

Nature Communications, 2025, vol. 16, issue 1, 1-13

Abstract: Abstract Understanding the phylogenetic relationships among species is crucial for comprehending major evolutionary transitions. Despite the ever-growing volume of sequence data, constructing reliable phylogenetic trees effectively becomes more challenging for current analytical methods. In this study, we introduce a new solution to accelerate the integration of novel taxa into an existing phylogenetic tree using a pretrained DNA language model. Our approach identifies the taxonomic unit of a newly collected sequence using existing taxonomic classification systems and updates the corresponding subtree. Specifically, we leverage a pretrained BERT network to obtain high-dimensional sequence representations, which are used not only to determine the subtree to be updated, but also identify potentially valuable regions for subtree construction. We demonstrate the effectiveness of our method, named PhyloTune, through experiments on simulated datasets, as well as our curated Plant (focusing on Embryophyta) and microbial (focusing on Bordetella genus) datasets. Our findings provide evidence that phylogenetic trees can be constructed by automatically selecting the most informative regions of sequences, without manual selection of molecular markers. This discovery offers a guide for further research into the functional aspects of different regions of DNA sequences, enriching our understanding of biology.

Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.nature.com/articles/s41467-025-61684-3 Abstract (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-61684-3

Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/

DOI: 10.1038/s41467-025-61684-3

Access Statistics for this article

Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie

More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-07-28
Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-61684-3