EconPapers    
Economics at your fingertips  
 

G4mer: An RNA language model for transcriptome-wide identification of G-quadruplexes and disease variants from population-scale genetic data

Farica Zhuang, Danielle Gutman, Nathaniel Islas, Bryan B. Guzman, Alli Jimenez, San Jewell, Nicholas J. Hand, Katherine Nathanson, Daniel Dominguez and Yoseph Barash ()
Additional contact information
Farica Zhuang: University of Pennsylvania, Department of Computer and Information Science
Danielle Gutman: University of Pennsylvania, Department of Genetics, Perelman School of Medicine
Nathaniel Islas: University of Pennsylvania, Department of Computer and Information Science
Bryan B. Guzman: University of North Carolina at Chapel Hill, Department of Pharmacology
Alli Jimenez: University of North Carolina at Chapel Hill, Department of Biochemistry and Biophysics
San Jewell: University of Pennsylvania, Department of Genetics, Perelman School of Medicine
Nicholas J. Hand: University of Pennsylvania, Department of Genetics, Perelman School of Medicine
Katherine Nathanson: University of Pennsylvania, Division of Human Genetics and Translational Medicine, Dept of Medicine, Perelman School of Medicine
Daniel Dominguez: University of North Carolina at Chapel Hill, Department of Pharmacology
Yoseph Barash: University of Pennsylvania, Department of Computer and Information Science

Nature Communications, 2025, vol. 16, issue 1, 1-17

Abstract: Abstract RNA G-quadruplexes (rG4s) are key regulatory elements in gene expression, yet the effects of genetic variants on rG4 formation remain underexplored. Here, we introduce G4mer, an RNA language model that predicts rG4 formation, classifies rG4 subtypes, and evaluates the effects of genetic variants across the transcriptome. G4mer significantly improves accuracy over existing methods and uncovers subtype-specific differences in mutational sensitivity and evolutionary constraint, highlighting sequence length and flanking motifs as important rG4 features. Applying G4mer to $${5}^{{\prime} }$$ 5 ′ untranslated region (UTR) variations, we identify variants in breast cancer-associated genes that alter rG4 formation and validate their impact on structure and gene expression. These results demonstrate the potential of integrating computational models with experimental approaches to study rG4 function, especially in diseases where non-coding variants are often overlooked. To support broader applications, G4mer is available as both a web tool and a downloadable model.

Date: 2025
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.nature.com/articles/s41467-025-65020-7 Abstract (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-65020-7

Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/

DOI: 10.1038/s41467-025-65020-7

Access Statistics for this article

Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie

More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-12-06
Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-65020-7