Prediction and outlier detection in classification problems
Leying Guan and
Robert Tibshirani
Journal of the Royal Statistical Society Series B, 2022, vol. 84, issue 2, 524-546
Abstract:
We consider the multi‐class classification problem when the training data and the out‐of‐sample test data may have different distributions and propose a method called BCOPS (balanced and conformal optimized prediction sets). BCOPS constructs a prediction set C(x) as a subset of class labels, possibly empty. It tries to optimize the out‐of‐sample performance, aiming to include the correct class and to detect outliers x as often as possible. BCOPS returns no prediction (corresponding to C(x) equal to the empty set) if it infers x to be an outlier. The proposed method combines supervised learning algorithms with conformal prediction to minimize a misclassification loss averaged over the out‐of‐sample distribution. The constructed prediction sets have a finite sample coverage guarantee without distributional assumptions. We also propose a method to estimate the outlier detection rate of a given procedure. We prove asymptotic consistency and optimality of our proposals under suitable assumptions and illustrate our methods on real data examples.
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://doi.org/10.1111/rssb.12443
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jorssb:v:84:y:2022:i:2:p:524-546
Ordering information: This journal article can be ordered from
http://ordering.onli ... 1111/(ISSN)1467-9868
Access Statistics for this article
Journal of the Royal Statistical Society Series B is currently edited by P. Fryzlewicz and I. Van Keilegom
More articles in Journal of the Royal Statistical Society Series B from Royal Statistical Society Contact information at EDIRC.
Bibliographic data for series maintained by Wiley Content Delivery ().