Reproducibility of biomarker identifications from mass spectrometry proteomic data in cancer studies
Liang Yulan (),
Kelemen Adam and
Additional contact information
Liang Yulan: Department of Family and Community Health, University of Maryland, Baltimore, MD 21201-1579, USA
Kelemen Adam: Department of Computer Science, University of Maryland, College Park, MD 20742, USA
Kelemen Arpad: Department of Organizational Systems and Adult Health, University of Maryland, Baltimore, MD 21201-1579, USA
Statistical Applications in Genetics and Molecular Biology, 2019, vol. 18, issue 3, 13
Reproducibility of disease signatures and clinical biomarkers in multi-omics disease analysis has been a key challenge due to a multitude of factors. The heterogeneity of the limited sample, various biological factors such as environmental confounders, and the inherent experimental and technical noises, compounded with the inadequacy of statistical tools, can lead to the misinterpretation of results, and subsequently very different biology. In this paper, we investigate the biomarker reproducibility issues, potentially caused by differences of statistical methods with varied distribution assumptions or marker selection criteria using Mass Spectrometry proteomic ovarian tumor data. We examine the relationship between effect sizes, p values, Cauchy p values, False Discovery Rate p values, and the rank fractions of identified proteins out of thousands in the limited heterogeneous sample. We compared the markers identified from statistical single features selection approaches with machine learning wrapper methods. The results reveal marked differences when selecting the protein markers from varied methods with potential selection biases and false discoveries, which may be due to the small effects, different distribution assumptions, and p value type criteria versus prediction accuracies. The alternative solutions and other related issues are discussed in supporting the reproducibility of findings for clinical actionable outcomes.
Keywords: false discovery rate; mass spectrometry; ovarian cancer; p value; proteomics; reproducibility (search for similar items in EconPapers)
References: Add references at CitEc
Citations: Track citations by RSS feed
Downloads: (external link)
For access to full text, subscription to the journal or payment for the individual article is required.
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:18:y:2019:i:3:p:13:n:3
Ordering information: This journal article can be ordered from
Access Statistics for this article
Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf
More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().