EconPapers    
Economics at your fingertips  
 

Automated Bayesian variable selection methods for binary regression models with missing covariate data

Michael Bergrab () and Christian Aßmann ()
Additional contact information
Michael Bergrab: Leibniz Institute for Educational Trajectories
Christian Aßmann: Leibniz Institute for Educational Trajectories

AStA Wirtschafts- und Sozialstatistisches Archiv, 2024, vol. 18, issue 2, No 4, 203-244

Abstract: Abstract Data collection and the availability of large data sets has increased over the last decades. In both statistical and machine learning frameworks, two methodological issues typically arise when performing regression analysis on large data sets. First, variable selection is crucial in regression modeling, as it helps to identify an appropriate model with respect to the considered set of conditioning variables. Second, especially in the context of survey data, handling of missing values is important for estimation, which occur even with state-of-the-art data collection and processing methods. Within this paper, we provide an Bayesian approach based on a spike-and-slab prior for the regression coefficients, which allows for simultaneous handling of variable selection and estimation in combination with handling of missing values in covariate data. The paper also discusses the implementation of the approach using Markov chain Monte Carlo techniques and provides results for simulated data sets and an empirical illustration based on data from the German National Educational Panel Study. The suggested Bayesian approach is compared to other statistical and machine learning frameworks such as Lasso, ridge regression, and Elastic net, and is shown to perform well in terms of estimation performance and variable selection accuracy. The simulation results demonstrate that ignoring the handling of missing values in data sets can lead to the generation of biased selection results. Overall, the proposed Bayesian method offers a holistic, flexible, and powerful framework for variable selection in the presence of missing covariate data.

Keywords: Bayesian Estimation; Variable Selection; Model Selection; Sparsity; Spike-and-Slab Prior; Elastic Net; Shrinkage; Estimation; JEL classification; C11; C18; C55 (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s11943-024-00345-1 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:astaws:v:18:y:2024:i:2:d:10.1007_s11943-024-00345-1

Ordering information: This journal article can be ordered from
http://www.springer. ... ce/journal/11943/PS2

DOI: 10.1007/s11943-024-00345-1

Access Statistics for this article

AStA Wirtschafts- und Sozialstatistisches Archiv is currently edited by Ralf Münnich

More articles in AStA Wirtschafts- und Sozialstatistisches Archiv from Springer, Deutsche Statistische Gesellschaft - German Statistical Society
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:astaws:v:18:y:2024:i:2:d:10.1007_s11943-024-00345-1