Optimal Data Splitting for Holdout Cross-Validation in Large Covariance Matrix Estimation
Lamia Lamrani,
Christian Bongiorno and
Marc Potters
Papers from arXiv.org
Abstract:
Cross-validation is a statistical tool that can be used to improve large covariance matrix estimation. Although its efficiency is observed in practical applications, the theoretical reasons behind it remain largely intuitive, with formal proofs currently lacking. To carry on analytical analysis, we focus on the holdout method, a single iteration of cross-validation, rather than the traditional $k$-fold approach. We derive a closed-form expression for the estimation error when the population matrix follows a white inverse Wishart distribution, and we observe the optimal train-test split scales as the square root of the matrix dimension. For general population matrices, we connected the error to the variance of eigenvalues distribution, but approximations are necessary. Interestingly, in the high-dimensional asymptotic regime, both the holdout and $k$-fold cross-validation methods converge to the optimal estimator when the train-test ratio scales with the square root of the matrix dimension.
Date: 2025-03
References: Add references at CitEc
Citations:
Downloads: (external link)
http://arxiv.org/pdf/2503.15186 Latest version (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2503.15186
Access Statistics for this paper
More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators (help@arxiv.org).