Systematic identification of intron retention associated variants from massive publicly available transcriptome sequencing data
Yuichi Shiraishi (),
Ai Okada,
Kenichi Chiba,
Asuka Kawachi,
Ikuko Omori,
Raúl Nicolás Mateos,
Naoko Iida,
Hirofumi Yamauchi,
Kenjiro Kosaki and
Akihide Yoshimi
Additional contact information
Yuichi Shiraishi: National Cancer Center Research Institute
Ai Okada: National Cancer Center Research Institute
Kenichi Chiba: National Cancer Center Research Institute
Asuka Kawachi: National Cancer Center Research Institute
Ikuko Omori: National Cancer Center Research Institute
Raúl Nicolás Mateos: National Cancer Center Research Institute
Naoko Iida: National Cancer Center Research Institute
Hirofumi Yamauchi: National Cancer Center Research Institute
Kenjiro Kosaki: Keio University School of Medicine
Akihide Yoshimi: National Cancer Center Research Institute
Nature Communications, 2022, vol. 13, issue 1, 1-13
Abstract:
Abstract Many disease-associated genomic variants disrupt gene function through abnormal splicing. With the advancement of genomic medicine, identifying disease-associated splicing associated variants has become more important than ever. Most bioinformatics approaches to detect splicing associated variants require both genome and transcriptomic data. However, there are not many datasets where both of them are available. In this study, we develop a methodology to detect genomic variants that cause splicing changes (more specifically, intron retention), using transcriptome sequencing data alone. After evaluating its sensitivity and precision, we apply it to 230,988 transcriptome sequencing data from the publicly available repository and identified 27,049 intron retention associated variants (IRAVs). In addition, by exploring positional relationships with variants registered in existing disease databases, we extract 3,000 putative disease-associated IRAVs, which range from cancer drivers to variants linked with autosomal recessive disorders. The in-silico screening framework demonstrates the possibility of near-automatically acquiring medical knowledge, making the most of massively accumulated publicly available sequencing data. Collections of IRAVs identified in this study are available through IRAVDB ( https://iravdb.io/ ).
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://www.nature.com/articles/s41467-022-32887-9 Abstract (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-32887-9
Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/
DOI: 10.1038/s41467-022-32887-9
Access Statistics for this article
Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie
More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().