EconPapers    
Economics at your fingertips  
 

Impact of Regressand Stratification in Dataset Shift Caused by Cross-Validation

José A. Sáez and José L. Romero-Béjar
Additional contact information
José A. Sáez: Department of Statistics and Operations Research, University of Granada, Fuentenueva s/n, 18071 Granada, Spain
José L. Romero-Béjar: Department of Statistics and Operations Research, University of Granada, Fuentenueva s/n, 18071 Granada, Spain

Mathematics, 2022, vol. 10, issue 14, 1-14

Abstract: Data that have not been modeled cannot be correctly predicted. Under this assumption, this research studies how k-fold cross-validation can introduce dataset shift in regression problems. This fact implies data distributions in the training and test sets to be different and, therefore, a deterioration of the model performance estimation. Even though the stratification of the output variable is widely used in the field of classification to reduce the impacts of dataset shift induced by cross-validation, its use in regression is not widespread in the literature. This paper analyzes the consequences for dataset shift of including different regressand stratification schemes in cross-validation with regression data. The results obtained show that these allow for creating more similar training and test sets, reducing the presence of dataset shift related to cross-validation. The bias and deviation of the performance estimation results obtained by regression algorithms are improved using the highest amounts of strata, as are the number of cross-validation repetitions necessary to obtain these better results.

Keywords: cross-validation; dataset shift; target shift; stratification; regression (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/10/14/2538/pdf (application/pdf)
https://www.mdpi.com/2227-7390/10/14/2538/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:10:y:2022:i:14:p:2538-:d:868165

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jmathe:v:10:y:2022:i:14:p:2538-:d:868165