Robust and parallel Bayesian model selection
Michael Minyi Zhang,
Henry Lam and
Lizhen Lin
Computational Statistics & Data Analysis, 2018, vol. 127, issue C, 229-247
Abstract:
Effective and accurate model selection is an important problem in modern data analysis. One of the major challenges is the computational burden required to handle large datasets that cannot be stored or processed on one machine. Another challenge one may encounter is the presence of outliers and contaminations that damage the inference quality. The parallel “divide and conquer” model selection strategy divides the observations of the full dataset into roughly equal subsets and perform inference and model selection independently on each subset. After local subset inference, this method aggregates the posterior model probabilities or other model/variable selection criteria to obtain a final model by using the notion of geometric median. This approach leads to improved concentration in finding the “correct” model and model parameters and also is provably robust to outliers and data contamination.
Keywords: Machine learning; Bayesian statistics; Model selection; Scalable inference (search for similar items in EconPapers)
Date: 2018
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0167947318301257
Full text for ScienceDirect subscribers only.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:csdana:v:127:y:2018:i:c:p:229-247
DOI: 10.1016/j.csda.2018.05.016
Access Statistics for this article
Computational Statistics & Data Analysis is currently edited by S.P. Azen
More articles in Computational Statistics & Data Analysis from Elsevier
Bibliographic data for series maintained by Catherine Liu ().