Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence

Lusk, Ryan; Stene, Evan; Banaei-Kashani, Farnoush; Tabakoff, Boris; Kechris, Katerina; Saba, Laura M.

Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence

Ryan Lusk (), Evan Stene, Farnoush Banaei-Kashani, Boris Tabakoff, Katerina Kechris and Laura M. Saba
Additional contact information
Ryan Lusk: University of Colorado Anschutz Medical Campus
Evan Stene: University of Colorado Denver
Farnoush Banaei-Kashani: University of Colorado Denver
Boris Tabakoff: University of Colorado Anschutz Medical Campus
Katerina Kechris: University of Colorado Anschutz Medical Campus
Laura M. Saba: University of Colorado Anschutz Medical Campus

Nature Communications, 2021, vol. 12, issue 1, 1-13

Abstract: Abstract Annotation of polyadenylation sites from short-read RNA sequencing alone is a challenging computational task. Other algorithms rooted in DNA sequence predict potential polyadenylation sites; however, in vivo expression of a particular site varies based on a myriad of conditions. Here, we introduce aptardi (alternative polyadenylation transcriptome analysis from RNA-Seq data and DNA sequence information), which leverages both DNA sequence and RNA sequencing in a machine learning paradigm to predict expressed polyadenylation sites. Specifically, as input aptardi takes DNA nucleotide sequence, genome-aligned RNA-Seq data, and an initial transcriptome. The program evaluates these initial transcripts to identify expressed polyadenylation sites in the biological sample and refines transcript 3′-ends accordingly. The average precision of the aptardi model is twice that of a standard transcriptome assembler. In particular, the recall of the aptardi model (the proportion of true polyadenylation sites detected by the algorithm) is improved by over three-fold. Also, the model—trained using the Human Brain Reference RNA commercial standard—performs well when applied to RNA-sequencing samples from different tissues and different mammalian species. Finally, aptardi’s input is simple to compile and its output is easily amenable to downstream analyses such as quantitation and differential expression.

Date: 2021
References: Add references at CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://www.nature.com/articles/s41467-021-21894-x Abstract (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:12:y:2021:i:1:d:10.1038_s41467-021-21894-x

Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/

DOI: 10.1038/s41467-021-21894-x

Access Statistics for this article

Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie

More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().