EconPapers    
Economics at your fingertips  
 

DIA-BERT: pre-trained end-to-end transformer models for enhanced DIA proteomics data analysis

Zhiwei Liu, Pu Liu, Yingying Sun, Zongxiang Nie, Xiaofan Zhang, Yuqi Zhang, Yi Chen () and Tiannan Guo ()
Additional contact information
Zhiwei Liu: Westlake University
Pu Liu: Westlake Omics (Hangzhou) Biotechnology Co., Ltd.
Yingying Sun: Westlake University
Zongxiang Nie: Westlake University
Xiaofan Zhang: Westlake University
Yuqi Zhang: Westlake University
Yi Chen: Westlake University
Tiannan Guo: Westlake University

Nature Communications, 2025, vol. 16, issue 1, 1-9

Abstract: Abstract Data-independent acquisition mass spectrometry (DIA-MS) has become increasingly pivotal in quantitative proteomics. In this study, we present DIA-BERT, a software tool that harnesses a transformer-based pre-trained artificial intelligence (AI) model for analyzing DIA proteomics data. The identification model was trained using over 276 million high-quality peptide precursors extracted from existing DIA-MS files, while the quantification model was trained on 34 million peptide precursors from synthetic DIA-MS files. When compared to DIA-NN, DIA-BERT demonstrated a 51% increase in protein identifications and 22% more peptide precursors on average across five human cancer sample sets (cervical cancer, pancreatic adenocarcinoma, myosarcoma, gallbladder cancer, and gastric carcinoma), achieving high quantitative accuracy. This study underscores the potential of leveraging pre-trained models and synthetic datasets to enhance the analysis of DIA proteomics.

Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.nature.com/articles/s41467-025-58866-4 Abstract (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-58866-4

Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/

DOI: 10.1038/s41467-025-58866-4

Access Statistics for this article

Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie

More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-05-10
Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-58866-4