Robust Variable Selection with Optimality Guarantees for High-Dimensional Logistic Regression

Insolia, Luca; Kenney, Ana; Calovi, Martina; Chiaromonte, Francesca

Robust Variable Selection with Optimality Guarantees for High-Dimensional Logistic Regression

Luca Insolia, Ana Kenney, Martina Calovi and Francesca Chiaromonte
Additional contact information
Luca Insolia: Faculty of Sciences, Scuola Normale Superiore, 56126 Pisa, Italy
Ana Kenney: Department of Statistics, Pennsylvania State University, University Park, PA 16802, USA
Martina Calovi: Department of Geography, Norwegian University of Science and Technology, 7491 Trondheim, Norway
Francesca Chiaromonte: Institute of Economics & EMbeDS, Sant’Anna School of Advanced Studies, 56127 Pisa, Italy

Stats, 2021, vol. 4, issue 3, 1-17

Abstract: High-dimensional classification studies have become widespread across various domains. The large dimensionality, coupled with the possible presence of data contamination, motivates the use of robust, sparse estimation methods to improve model interpretability and ensure the majority of observations agree with the underlying parametric model. In this study, we propose a robust and sparse estimator for logistic regression models, which simultaneously tackles the presence of outliers and/or irrelevant features. Specifically, we propose the use of L 0 -constraints and mixed-integer conic programming techniques to solve the underlying double combinatorial problem in a framework that allows one to pursue optimality guarantees. We use our proposal to investigate the main drivers of honey bee ( Apis mellifera ) loss through the annual winter loss survey data collected by the Pennsylvania State Beekeepers Association. Previous studies mainly focused on predictive performance, however our approach produces a more interpretable classification model and provides evidence for several outlying observations within the survey data. We compare our proposal with existing heuristic methods and non-robust procedures, demonstrating its effectiveness. In addition to the application to honey bee loss, we present a simulation study where our proposal outperforms other methods across most performance measures and settings.

Keywords: classification; logistic slippage model; mixed-integer conic programming; model selection; honey bee loss; outlier detection; robust estimation (search for similar items in EconPapers)
JEL-codes: C1 C10 C11 C14 C15 C16 (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2571-905X/4/3/40/pdf (application/pdf)
https://www.mdpi.com/2571-905X/4/3/40/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jstats:v:4:y:2021:i:3:p:40-681:d:626244

Access Statistics for this article

Stats is currently edited by Mrs. Minnie Li

More articles in Stats from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().