EconPapers    
Economics at your fingertips  
 

Test Set Sizing via Random Matrix Theory

Alexander Dubbs ()

SN Operations Research Forum, 2024, vol. 5, issue 1, 1-17

Abstract: Abstract This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression with m data points, each an independent n-dimensional multivariate Gaussian. It defines “ideal” as satisfying the integrity metric, i.e., the empirical model error is the actual measurement noise, and thus fairly reflects the value or lack of same of the model. This paper is the first to solve for the training and test set sizes for any model in a way that is truly optimal. The number of data points in the training set is the root of a quartic polynomial Theorem 1 derives which depends only on m and n; the covariance matrix of the multivariate Gaussian, the true model parameters, and the true measurement noise drop out of the calculations. The critical mathematical difficulties were realizing that the problems herein were discussed in the context of the Jacobi Ensemble, a probability distribution describing the eigenvalues of a known random matrix model, many integrals over which are known, since they are in the style of Selberg and Aomoto. Mathematical results are supported with thorough computational evidence. This paper is a step towards automatic choices of training/test set sizes in machine learning.

Keywords: Jacobi Ensemble; Linear model; Machine learning; Random Matrix Theory; Test set size (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s43069-024-00292-1 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:snopef:v:5:y:2024:i:1:d:10.1007_s43069-024-00292-1

Ordering information: This journal article can be ordered from
https://www.springer.com/journal/43069

DOI: 10.1007/s43069-024-00292-1

Access Statistics for this article

SN Operations Research Forum is currently edited by Marco Lübbecke

More articles in SN Operations Research Forum from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-04-12
Handle: RePEc:spr:snopef:v:5:y:2024:i:1:d:10.1007_s43069-024-00292-1