EconPapers    
Economics at your fingertips  
 

Estimating the class prior for positive and unlabelled data via logistic regression

Małgorzata Łazęcka (), Jan Mielniczuk () and Paweł Teisseyre ()
Additional contact information
Małgorzata Łazęcka: Polish Academy of Sciences
Jan Mielniczuk: Polish Academy of Sciences
Paweł Teisseyre: Polish Academy of Sciences

Advances in Data Analysis and Classification, 2021, vol. 15, issue 4, No 9, 1039-1068

Abstract: Abstract In the paper, we revisit the problem of class prior probability estimation with positive and unlabelled data gathered in a single-sample scenario. The task is important as it is known that in positive unlabelled setting, a classifier can be successfully learned if the class prior is available. We show that without additional assumptions, class prior probability is not identifiable and thus the existing non-parametric estimators are necessarily biased in general if extra assumptions are not imposed. The magnitude of their bias is also investigated. The problem becomes identifiable when the probabilistic structure satisfies mild semi-parametric assumptions. Consequently, we propose a method based on a logistic fit and a concave minorization of its (non-concave) log-likelihood. The experiments conducted on artificial and benchmark datasets as well as on a large clinical database MIMIC indicate that the estimation errors for the proposed method are usually lower than for its competitors and that it is robust against departures from logistic settings.

Keywords: Positive unlabelled learning; Class prior estimation; Logistic regression; Non-convex optimisation; Minorization-maximization algorithm; 62H30; 62J12 (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s11634-021-00444-9 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:advdac:v:15:y:2021:i:4:d:10.1007_s11634-021-00444-9

Ordering information: This journal article can be ordered from
http://www.springer. ... ds/journal/11634/PS2

DOI: 10.1007/s11634-021-00444-9

Access Statistics for this article

Advances in Data Analysis and Classification is currently edited by H.-H. Bock, W. Gaul, A. Okada, M. Vichi and C. Weihs

More articles in Advances in Data Analysis and Classification from Springer, German Classification Society - Gesellschaft für Klassifikation (GfKl), Japanese Classification Society (JCS), Classification and Data Analysis Group of the Italian Statistical Society (CLADAG), International Federation of Classification Societies (IFCS)
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:advdac:v:15:y:2021:i:4:d:10.1007_s11634-021-00444-9