EconPapers    
Economics at your fingertips  
 

Consistent RNA sequencing contamination in GTEx and other data sets

Tim O. Nieuwenhuis, Stephanie Y. Yang, Rohan X. Verma, Vamsee Pillalamarri, Dan E. Arking, Avi Z. Rosenberg, Matthew N. McCall and Marc K. Halushka ()
Additional contact information
Tim O. Nieuwenhuis: Johns Hopkins University SOM
Stephanie Y. Yang: Johns Hopkins University SOM
Rohan X. Verma: Johns Hopkins University SOM
Vamsee Pillalamarri: Johns Hopkins University SOM
Dan E. Arking: Johns Hopkins University SOM
Avi Z. Rosenberg: Johns Hopkins University SOM
Matthew N. McCall: University of Rochester Medical Center
Marc K. Halushka: Johns Hopkins University SOM

Nature Communications, 2020, vol. 11, issue 1, 1-10

Abstract: Abstract A challenge of next generation sequencing is read contamination. We use Genotype-Tissue Expression (GTEx) datasets and technical metadata along with RNA-seq datasets from other studies to understand factors that contribute to contamination. Here we report, of 48 analyzed tissues in GTEx, 26 have variant co-expression clusters of four highly expressed and pancreas-enriched genes (PRSS1, PNLIP, CLPS, and/or CELA3A). Fourteen additional highly expressed genes from other tissues also indicate contamination. Sample contamination is strongly associated with a sample being sequenced on the same day as a tissue that natively expresses those genes. Discrepant SNPs across four contaminating genes validate the contamination. Low-level contamination affects ~40% of samples and leads to numerous eQTL assignments in inappropriate tissues among these 18 genes. This type of contamination occurs widely, impacting bulk and single cell (scRNA-seq) data set analysis. In conclusion, highly expressed, tissue-enriched genes basally contaminate GTEx and other datasets impacting analyses.

Date: 2020
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.nature.com/articles/s41467-020-15821-9 Abstract (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:11:y:2020:i:1:d:10.1038_s41467-020-15821-9

Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/

DOI: 10.1038/s41467-020-15821-9

Access Statistics for this article

Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie

More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-19
Handle: RePEc:nat:natcom:v:11:y:2020:i:1:d:10.1038_s41467-020-15821-9