EconPapers    
Economics at your fingertips  
 

A statistical testing procedure for validating class labels

Melissa C. Key, Susanne Ragg and Benzion Boukai

Journal of Applied Statistics, 2023, vol. 50, issue 8, 1725-1749

Abstract: Motivated by an open problem of validating protein identities in label-free shotgun proteomics work-flows, we present a testing procedure to validate class (protein) labels using available measurements across N instances (peptides). More generally, we present a non-parametric solution to the problem of identifying instances that are deemed as outliers relative to the subset of instances assigned to the same class. The primary assumption is that measured distances between instances within the same class are stochastically smaller than measured distances between instances from different classes. We show that the overall type I error probability across all instances within a class can be controlled by some fixed value (say α). We also demonstrate conditions where similar results on type II error probability hold. The theoretical results are supplemented by an extensive numerical study illustrating the applicability and viability of our method. Even with up to 25% of instances initially mislabeled, our testing procedure maintains a high specificity and greatly reduces the proportion of mislabeled instances. The applicability and effectiveness of our testing procedure is further illustrated by a detailed example on a proteomics data set from children with sickle cell disease where five spike-in proteins acted as contrasting controls.

Date: 2023
References: Add references at CitEc
Citations:

Downloads: (external link)
http://hdl.handle.net/10.1080/02664763.2022.2038546 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:taf:japsta:v:50:y:2023:i:8:p:1725-1749

Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/CJAS20

DOI: 10.1080/02664763.2022.2038546

Access Statistics for this article

Journal of Applied Statistics is currently edited by Robert Aykroyd

More articles in Journal of Applied Statistics from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().

 
Page updated 2025-03-20
Handle: RePEc:taf:japsta:v:50:y:2023:i:8:p:1725-1749