On the Effect and Remedies of Shrinkage on Classification Probability Estimation

Zhang, Chong; Liu, Yufeng; Wu, Zhengxiao

On the Effect and Remedies of Shrinkage on Classification Probability Estimation

Chong Zhang, Yufeng Liu and Zhengxiao Wu

The American Statistician, 2013, vol. 67, issue 3, 134-142

Abstract: Shrinkage methods have been shown to be effective for classification problems. As a form of regularization, shrinkage through penalization helps to avoid overfitting and produces accurate classifiers for prediction, especially when the dimension is relatively high. Despite the benefit of shrinkage on classification accuracy of resulting classifiers, in this article, we demonstrate that shrinkage creates biases on classification probability estimation. In many cases, this bias can be large and consequently yield poor class probability estimation when the sample size is small or moderate. We offer some theoretical insights into the effect of shrinkage and provide remedies for better class probability estimation. Using penalized logistic regression and proximal support vector machines as examples, we demonstrate that our proposed refit method gives similar classification accuracy and remarkable improvements on probability estimation on several simulated and real data examples.

Date: 2013
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
http://hdl.handle.net/10.1080/00031305.2013.817356 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:taf:amstat:v:67:y:2013:i:3:p:134-142

Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/UTAS20

DOI: 10.1080/00031305.2013.817356

Access Statistics for this article

The American Statistician is currently edited by Eric Sampson

More articles in The American Statistician from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().