Simultaneous Learning the Dimension and Parameter of a Statistical Model with Big Data
Long Wang,
Fangzheng Xie and
Yanxun Xu ()
Additional contact information
Long Wang: Johns Hopkins University
Fangzheng Xie: Johns Hopkins University
Yanxun Xu: Johns Hopkins University
Statistics in Biosciences, 2023, vol. 15, issue 3, No 4, 583-607
Abstract:
Abstract Estimating the dimension of a model along with its parameters is fundamental to many statistical learning problems. Traditional model selection methods often approach this task by a two-step procedure: first estimate model parameters under every candidate model dimension, then select the best model dimension based on certain information criterion. When the number of candidate models is large, however, this two-step procedure is highly inefficient and not scalable. We develop a novel automated and scalable approach with theoretical guarantees, called mixed-binary simultaneous perturbation stochastic approximation (MB-SPSA), to simultaneously estimate the dimension and parameters of a statistical model. To demonstrate the broad practicability of the MB-SPSA algorithm, we apply the MB-SPSA to various classic statistical models including K-means clustering, Gaussian mixture models with an unknown number of components, sparse linear regression, and latent factor models with an unknown number of factors. We evaluate the performance of the MB-SPSA through simulation studies and an application to a single-cell sequencing dataset in terms of accuracy, running time, and scalability. The code implementing the MB-SPSA is available at http://github.com/wanglong24/MB-SPSA .
Keywords: Clustering; Mixed-binary optimization; Mini-batch learning; Single-cell sequencing; Stochastic optimization (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s12561-021-09324-4 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:stabio:v:15:y:2023:i:3:d:10.1007_s12561-021-09324-4
Ordering information: This journal article can be ordered from
http://www.springer.com/journal/12561
DOI: 10.1007/s12561-021-09324-4
Access Statistics for this article
Statistics in Biosciences is currently edited by Hongyu Zhao and Xihong Lin
More articles in Statistics in Biosciences from Springer, International Chinese Statistical Association
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().