Testing for Outliers with Conformal P-Values
Stephen Bates,
Emmanuel Candes,
Lihua Lei,
Yaniv Romano and
Matteo Sesia
Additional contact information
Stephen Bates: UC Berkeley
Emmanuel Candes: Stanford U
Lihua Lei: Stanford U
Yaniv Romano: Israel Institute of Technology
Matteo Sesia: University of Southern California
Research Papers from Stanford University, Graduate School of Business
Abstract:
This paper studies the construction of p-values for nonparametric outlier detection, taking a multiple-testing perspective. The goal is to test whether new independent samples belong to the same distribution as a reference data set or are outliers. We propose a solution based on conformal inference, a broadly applicable framework which yields p-values that are marginally valid but mutually dependent for different test points. We prove these p-values are positively dependent and enable exact false discovery rate control, although in a relatively weak marginal sense. We then introduce a new method to compute p-values that are both valid conditionally on the training data and independent of each other for different test points; this paves the way to stronger type-I error guarantees. Our results depart from classical conformal inference as we leverage concentration inequalities rather than combinatorial arguments to establish our finite-sample guarantees. Furthermore, our techniques also yield a uniform confidence bound for the false positive rate of any outlier detection algorithm, as a function of the threshold applied to its raw statistics. Finally, the relevance of our results is demonstrated by numerical experiments on real and simulated data.
Date: 2022-05
New Economics Papers: this item is included in nep-ecm
References: Add references at CitEc
Citations:
Downloads: (external link)
https://doi.org/10.48550/arXiv.2104.08279
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ecl:stabus:4027
Access Statistics for this paper
More papers in Research Papers from Stanford University, Graduate School of Business Contact information at EDIRC.
Bibliographic data for series maintained by ().