Reproducibility of biomarker identifications from mass spectrometry proteomic data in cancer studies

Yulan, Liang; Adam, Kelemen; Arpad, Kelemen

Reproducibility of biomarker identifications from mass spectrometry proteomic data in cancer studies

Liang Yulan (), Kelemen Adam and Kelemen Arpad
Additional contact information
Liang Yulan: Department of Family and Community Health, University of Maryland, Baltimore, MD 21201-1579, USA
Kelemen Adam: Department of Computer Science, University of Maryland, College Park, MD 20742, USA
Kelemen Arpad: Department of Organizational Systems and Adult Health, University of Maryland, Baltimore, MD 21201-1579, USA

Statistical Applications in Genetics and Molecular Biology, 2019, vol. 18, issue 3, 13

Abstract: Reproducibility of disease signatures and clinical biomarkers in multi-omics disease analysis has been a key challenge due to a multitude of factors. The heterogeneity of the limited sample, various biological factors such as environmental confounders, and the inherent experimental and technical noises, compounded with the inadequacy of statistical tools, can lead to the misinterpretation of results, and subsequently very different biology. In this paper, we investigate the biomarker reproducibility issues, potentially caused by differences of statistical methods with varied distribution assumptions or marker selection criteria using Mass Spectrometry proteomic ovarian tumor data. We examine the relationship between effect sizes, p values, Cauchy p values, False Discovery Rate p values, and the rank fractions of identified proteins out of thousands in the limited heterogeneous sample. We compared the markers identified from statistical single features selection approaches with machine learning wrapper methods. The results reveal marked differences when selecting the protein markers from varied methods with potential selection biases and false discoveries, which may be due to the small effects, different distribution assumptions, and p value type criteria versus prediction accuracies. The alternative solutions and other related issues are discussed in supporting the reproducibility of findings for clinical actionable outcomes.

Keywords: false discovery rate; mass spectrometry; ovarian cancer; p value; proteomics; reproducibility (search for similar items in EconPapers)
Date: 2019
References: Add references at CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1515/sagmb-2018-0039 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:18:y:2019:i:3:p:13:n:3

Ordering information: This journal article can be ordered from
https://www.degruyte ... urnal/key/sagmb/html

DOI: 10.1515/sagmb-2018-0039

Access Statistics for this article

Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf

More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().