Combining Predictors for Classification Using the Area Under the ROC Curve
Margaret Pepe,
Tianxi Cai and
Zheng Zhang
Additional contact information
Margaret Pepe: University of Washington
Tianxi Cai: Harvard University
Zheng Zhang: University of Washington
No 1021, UW Biostatistics Working Paper Series from Berkeley Electronic Press
Abstract:
We compare simple logistic regression with an alternative robust procedure for constructing linear predictors to be used for the two state classification task. Theoritical advantages of the robust procedure over logistic regression are: (i) although it assumes a generalized linear model for the dichotomous outcome variable, it does not require specification of the link function; (ii) it accommodates case-control designs even when the model is not logistic; and (iii) it yields sensible results even when the generalized linear model assumption fails to hold. Surprisingly, we find that the linear predictor derived from the logistic regression likelihood is very robust in the following sense: it yields prediction performance comparable with our theoretically robust procedure when the logistic model fails and even when the form of the linear predictor is incorrectly specified. This raises some intriguing questions about using logistic regression for prediction. Some preliminary explanations are given that draw from recent literature.Next we suggest that it may not be necessary to fit the linear function over the whole predictor space to achieve adequate classification properties. Procedures that restrict modeling to a subspace defined by minimally acceptable false-positive and false-negative error rates are suggested. We find that relaxing linearity assumptions to a subspace infers further robustness and that the logistic likelihood calculated over the restricted region provides a robust objective function for determining classification rules.Overall, our new procedure performs well but not substantially better than logistic regression. Further work is warranted to clarify the relationship between the two conceptually distinct procedures, and may provide a new conceptual basis for using the logistic likelihood to combine predictors.Note: This Working Paper is a revised version of the previously posted "Robust Binary Regression for Optimally Combining Predictors."
Keywords: Classification; Discriminant Analysis; Receiver Operating Characteristic Curve; Logistic; Likelihood (search for similar items in EconPapers)
Date: 2004-08-16
Note: oai:bepress.com:uwbiostat-1021
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
http://www.bepress.com/cgi/viewcontent.cgi?article=1021&context=uwbiostat (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bep:uwabio:1021
Access Statistics for this paper
More papers in UW Biostatistics Working Paper Series from Berkeley Electronic Press
Bibliographic data for series maintained by Christopher F. Baum ().