EconPapers    
Economics at your fingertips  
 

SMAP is a pipeline for sample matching in proteogenomics

Ling Li, Mingming Niu, Alyssa Erickson, Jie Luo, Kincaid Rowbotham, Kai Guo, He Huang, Yuxin Li, Yi Jiang, Junguk Hur, Chunyu Liu, Junmin Peng () and Xusheng Wang ()
Additional contact information
Ling Li: University of North Dakota
Mingming Niu: Center for Proteomics and Metabolomics, St. Jude Children’s Research Hospital
Alyssa Erickson: University of North Dakota
Jie Luo: State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of Agro-products, Zhejiang Academy of Agricultural Sciences
Kincaid Rowbotham: University of North Dakota
Kai Guo: University of Michigan
He Huang: University of North Dakota
Yuxin Li: Center for Proteomics and Metabolomics, St. Jude Children’s Research Hospital
Yi Jiang: School of Public Health, Tongji Medical College, Huazhong University of Science and Technology
Junguk Hur: School of medicine and health sciences, University of North Dakota
Chunyu Liu: SUNY Upstate Medical University
Junmin Peng: Center for Proteomics and Metabolomics, St. Jude Children’s Research Hospital
Xusheng Wang: University of North Dakota

Nature Communications, 2022, vol. 13, issue 1, 1-9

Abstract: Abstract The integration of genomics and proteomics data (proteogenomics) holds the promise of furthering the in-depth understanding of human disease. However, sample mix-up is a pervasive problem in proteogenomics because of the complexity of sample processing. Here, we present a pipeline for Sample Matching in Proteogenomics (SMAP) to verify sample identity and ensure data integrity. SMAP infers sample-dependent protein-coding variants from quantitative mass spectrometry (MS), and aligns the MS-based proteomic samples with genomic samples by two discriminant scores. Theoretical analysis with simulated data indicates that SMAP is capable of uniquely matching proteomic and genomic samples when ≥20% genotypes of individual samples are available. When SMAP was applied to a large-scale dataset generated by the PsychENCODE BrainGVEX project, 54 samples (19%) were corrected. The correction was further confirmed by ribosome profiling and chromatin sequencing (ATAC-seq) data from the same set of samples. Our results demonstrate that SMAP is an effective tool for sample verification in a large-scale MS-based proteogenomics study. SMAP is publicly available at https://github.com/UND-Wanglab/SMAP , and a web-based version can be accessed at https://smap.shinyapps.io/smap/ .

Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.nature.com/articles/s41467-022-28411-8 Abstract (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-28411-8

Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/

DOI: 10.1038/s41467-022-28411-8

Access Statistics for this article

Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie

More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-04-24
Handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-28411-8