EconPapers    
Economics at your fingertips  
 

Transcript assembly and annotations: Bias and adjustment

Qimin Zhang and Mingfu Shao

PLOS Computational Biology, 2023, vol. 19, issue 12, 1-20

Abstract: Transcript annotations play a critical role in gene expression analysis as they serve as a reference for quantifying isoform-level expression. The two main sources of annotations are RefSeq and Ensembl/GENCODE, but discrepancies between their methodologies and information resources can lead to significant differences. It has been demonstrated that the choice of annotation can have a significant impact on gene expression analysis. Furthermore, transcript assembly is closely linked to annotations, as assembling large-scale available RNA-seq data is an effective data-driven way to construct annotations, and annotations are often served as benchmarks to evaluate the accuracy of assembly methods. However, the influence of different annotations on transcript assembly is not yet fully understood. We investigate the impact of annotations on transcript assembly. Surprisingly, we observe that opposite conclusions can arise when evaluating assemblers with different annotations. To understand this striking phenomenon, we compare the structural similarity of annotations at various levels and find that the primary structural difference across annotations occurs at the intron-chain level. Next, we examine the biotypes of annotated and assembled transcripts and uncover a significant bias towards annotating and assembling transcripts with intron retentions, which explains above the contradictory conclusions. We develop a standalone tool, available at https://github.com/Shao-Group/irtool, that can be combined with an assembler to generate an assembly without intron retentions. We evaluate the performance of such a pipeline and offer guidance to select appropriate assembling tools for different application scenarios.Author summary: Transcript annotations are essential foundations for transcriptomic studies, offering valuable insights into gene structures, functions, and acting as references for isoform-level expression expression quantification and differential analysis. However, the impact of different annotations on transcript assembly remains uncertain. We demonstrated that the choice of an annotation can lead to conflicting outcomes when evaluating assemblers. Our investigation revealed the distinctive features of annotations that led to the aforementioned contradictory conclusion, through a comprehensive comparison of annotations from the perspectives of biotypes and gene structures, contributing to a broader, deeper understanding of annotations. Our research provides guidance in making well-informed choice of annotations and assemblers for practical RNA-seq data analysis.

Date: 2023
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011734 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 11734&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1011734

DOI: 10.1371/journal.pcbi.1011734

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2025-05-31
Handle: RePEc:plo:pcbi00:1011734