EconPapers    
Economics at your fingertips  
 

CellFM: a large-scale foundation model pre-trained on transcriptomics of 100 million human cells

Yuansong Zeng (), Jiancong Xie, Ningyuan Shangguan, Zhuoyi Wei, Wenbing Li, Yun Su, Shuangyu Yang, Chengyang Zhang, Jinbo Zhang, Nan Fang, Hongyu Zhang, Yutong Lu, Huiying Zhao (), Jue Fan (), Weijiang Yu () and Yuedong Yang ()
Additional contact information
Yuansong Zeng: Sun Yat-sen University
Jiancong Xie: Sun Yat-sen University
Ningyuan Shangguan: Sun Yat-sen University
Zhuoyi Wei: Sun Yat-sen University
Wenbing Li: Sun Yat-sen University
Yun Su: Ltd
Shuangyu Yang: Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University
Chengyang Zhang: Chongqing University
Jinbo Zhang: Nanjing
Nan Fang: Nanjing
Hongyu Zhang: Chongqing University
Yutong Lu: Sun Yat-sen University
Huiying Zhao: Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University
Jue Fan: Nanjing
Weijiang Yu: Sun Yat-sen University
Yuedong Yang: Sun Yat-sen University

Nature Communications, 2025, vol. 16, issue 1, 1-17

Abstract: Abstract Single-cell sequencing provides transcriptomic profiling at single-cell resolution, uncovering cellular heterogeneity with unprecedented precision. Yet, current single cell data analysis suffers from the inherent data noises, batch effects, and sparsity, highlighting the requirement of a unified model to represent cellular states. To circumvent this problem, many recent efforts focus on training single-cell foundation models based on large datasets. However, current human foundation models are still limited by the sizes of training data and model parameters. Here, we have collected a diverse dataset of 100 million human cells, on which we train a single-cell foundation model (CellFM) containing 800 million parameters. To balance efficiency and performance, the model is trained through a modified RetNet framework on the MindSpore. Extensive experiments have shown that CellFM outperforms existing models in cell annotation, perturbation prediction, gene function prediction, and gene-gene relationship capturing.

Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.nature.com/articles/s41467-025-59926-5 Abstract (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-59926-5

Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/

DOI: 10.1038/s41467-025-59926-5

Access Statistics for this article

Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie

More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-05-22
Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-59926-5