Omnireg-gpt: a high-efficiency foundation model for comprehensive genomic sequence understanding
Aowen Wang,
Jiaqi Li (),
Hongyu Dong,
Bocheng Xu,
Qingyu Yin,
Yanchao Xu,
Jie Fu and
Junbo Zhao ()
Additional contact information
Aowen Wang: Zhejiang University, College of Computer Science and Technology
Jiaqi Li: Zhejiang University Medical Center, Liangzhu Laboratory
Hongyu Dong: Zhejiang University, College of Computer Science and Technology
Bocheng Xu: Zhejiang University
Qingyu Yin: Zhejiang University
Yanchao Xu: Zhejiang University, College of Computer Science and Technology
Jie Fu: Shanghai Artificial Intelligence Laboratory
Junbo Zhao: Zhejiang University, College of Computer Science and Technology
Nature Communications, 2025, vol. 16, issue 1, 1-17
Abstract:
Abstract The human genome contains a sophisticated array of elements that regulate gene activity and organismal functions. Developing a large window foundation model capable of efficiently processing long sequence inputs is essential yet challenging for decoding the multi-layered and complex landscape of the cis-regulatory elements. Here, we introduce OmniReg-GPT, a generative foundation model designed for the low-resource pretraining of long genomic sequences by optimized attention mechanism. During pretraining, OmniReg-GPT captures the complete distribution of regulatory elements across nucleotide to megabase scales with efficient training speed and memory usage. We demonstrate exceptional performance in downstream regulotary applications spanning the entire spectrum of genomic scales, including various cis-regulatory elements identification, context dependent gene expression prediction, single-cell chromatin accessibility analysis, and 3D chromatin contact modeling. As a generative model, OmniReg-GPT also holds the potential to generate candidate cell-type-specific enhancers through prompt engineering. Overall, OmniReg-GPT extends the boundaries of foundation models in the genomic field, and provides a valuable pretraining model resource which can be extensively applied for genomic researches.
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.nature.com/articles/s41467-025-65066-7 Abstract (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-65066-7
Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/
DOI: 10.1038/s41467-025-65066-7
Access Statistics for this article
Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie
More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().