EconPapers    
Economics at your fingertips  
 

Input-dependent estimation of generalization error under covariate shift

Sugiyama Masashi and Müller Klaus-Robert

Statistics & Risk Modeling, 2005, vol. 23, issue 4, 249-279

Abstract: A common assumption in supervised learning is that the training and test input points follow the same probability distribution. However, this assumption is not fulfilled, e.g., in interpolation, extrapolation, active learning, or classification with imbalanced data. The violation of this assumption—known as the covariate shift—causes a heavy bias in standard generalization error estimation schemes such as cross-validation or Akaike's information criterion, and thus they result in poor model selection. In this paper, we propose an alternative estimator of the generalization error for the squared loss function when training and test distributions are different. The proposed generalization error estimator is shown to be exactly unbiased for finite samples if the learning target function is realizable and asymptotically unbiased in general. We also show that, in addition to the unbiasedness, the proposed generalization error estimator can accurately estimate the difference of the generalization error among different models, which is a desirable property in model selection. Numerical studies show that the proposed method compares favorably with existing model selection methods in regression for extrapolation and in classification with imbalanced data.

Keywords: linear regression; generalization error; model selection; covariate shift; sample selection bias; interpolation; extrapolation; active learning; classification with imbalanced data (search for similar items in EconPapers)
Date: 2005
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://doi.org/10.1524/stnd.2005.23.4.249 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bpj:strimo:v:23:y:2005:i:4/2005:p:249-279:n:1

Ordering information: This journal article can be ordered from
https://www.degruyter.com/journal/key/strm/html

DOI: 10.1524/stnd.2005.23.4.249

Access Statistics for this article

Statistics & Risk Modeling is currently edited by Robert Stelzer

More articles in Statistics & Risk Modeling from De Gruyter
Bibliographic data for series maintained by Peter Golla ().

 
Page updated 2025-03-19
Handle: RePEc:bpj:strimo:v:23:y:2005:i:4/2005:p:249-279:n:1