RNA language models predict mutations that improve RNA function
Yekaterina Shulgina,
Marena I. Trinidad,
Conner J. Langeberg,
Hunter Nisonoff,
Seyone Chithrananda,
Petr Skopintsev,
Amos J. Nissley,
Jaymin Patel,
Ron S. Boger,
Honglue Shi,
Peter H. Yoon,
Erin E. Doherty,
Tara Pande,
Aditya M. Iyer,
Jennifer A. Doudna and
Jamie H. D. Cate ()
Additional contact information
Yekaterina Shulgina: University of California
Marena I. Trinidad: University of California
Conner J. Langeberg: University of California
Hunter Nisonoff: University of California
Seyone Chithrananda: University of California
Petr Skopintsev: University of California
Amos J. Nissley: University of California
Jaymin Patel: University of California
Ron S. Boger: University of California
Honglue Shi: University of California
Peter H. Yoon: University of California
Erin E. Doherty: University of California
Tara Pande: University of California
Aditya M. Iyer: University of California
Jennifer A. Doudna: University of California
Jamie H. D. Cate: University of California
Nature Communications, 2024, vol. 15, issue 1, 1-17
Abstract:
Abstract Structured RNA lies at the heart of many central biological processes, from gene expression to catalysis. RNA structure prediction is not yet possible due to a lack of high-quality reference data associated with organismal phenotypes that could inform RNA function. We present GARNET (Gtdb Acquired RNa with Environmental Temperatures), a new database for RNA structural and functional analysis anchored to the Genome Taxonomy Database (GTDB). GARNET links RNA sequences to experimental and predicted optimal growth temperatures of GTDB reference organisms. Using GARNET, we develop sequence- and structure-aware RNA generative models, with overlapping triplet tokenization providing optimal encoding for a GPT-like model. Leveraging hyperthermophilic RNAs in GARNET and these RNA generative models, we identify mutations in ribosomal RNA that confer increased thermostability to the Escherichia coli ribosome. The GTDB-derived data and deep learning models presented here provide a foundation for understanding the connections between RNA sequence, structure, and function.
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.nature.com/articles/s41467-024-54812-y Abstract (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-54812-y
Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/
DOI: 10.1038/s41467-024-54812-y
Access Statistics for this article
Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie
More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().