Ensemble with Divisive Bagging for Feature Selection in Big Data
Yousung Park and
Tae Yeon Kwon ()
Additional contact information
Yousung Park: Korea University
Tae Yeon Kwon: Hankuk University of Foreign Studies
Computational Economics, 2025, vol. 66, issue 2, No 12, 1354 pages
Abstract:
Abstract We introduce Ensemble with Divisive Bagging (EDB), a new feature selection method in linear models, to address the excessive selection of features in big data due to deflated p-values. Extensive simulations show that EDB derives parsimonious models without loss of predictive performance compared to lasso, ridge, elastic-net, LARS, and FS. We also show that EDB estimates feature importance in linear models more accurately compared to Random Forest, XGBoost, and CatBoost. Additionally, we apply EDB to feature selection in models for house prices and loan defaults. Our findings highlight the advantages of EDB: (1) effectively addressing deflated p-values and preventing the inclusion of extraneous features; (2) ensuring unbiased coefficient estimation; (3) adaptability to various models relying on p-value-based inferences; (4) construction of statistically explainable models with feature attribution and importance by preserving inferences based on a linear model and p-values; and (5) allowing application to linear economic models without altering the previous functional form of the model.
Keywords: Feature selection; Bagging; Voting system; Ensemble; Big data; Feature importance (search for similar items in EconPapers)
JEL-codes: C15 C51 C52 C55 C63 C80 (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s10614-024-10741-y Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:kap:compec:v:66:y:2025:i:2:d:10.1007_s10614-024-10741-y
Ordering information: This journal article can be ordered from
http://www.springer. ... ry/journal/10614/PS2
DOI: 10.1007/s10614-024-10741-y
Access Statistics for this article
Computational Economics is currently edited by Hans Amman
More articles in Computational Economics from Springer, Society for Computational Economics Contact information at EDIRC.
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().