Subsemble: an ensemble method for combining subset-specific algorithm fits
Stephanie Sapp,
Mark J. van der Laan and
John Canny
Journal of Applied Statistics, 2014, vol. 41, issue 6, 1247-1259
Abstract:
Ensemble methods using the same underlying algorithm trained on different subsets of observations have recently received increased attention as practical prediction tools for massive data sets. We propose Subsemble: a general subset ensemble prediction method, which can be used for small, moderate, or large data sets. Subsemble partitions the full data set into subsets of observations, fits a specified underlying algorithm on each subset, and uses a clever form of V -fold cross-validation to output a prediction function that combines the subset-specific fits. We give an oracle result that provides a theoretical performance guarantee for Subsemble. Through simulations, we demonstrate that Subsemble can be a beneficial tool for small- to moderate-sized data sets, and often has better prediction performance than the underlying algorithm fit just once on the full data set. We also describe how to include Subsemble as a candidate in a SuperLearner library, providing a practical way to evaluate the performance of Subsemble relative to the underlying algorithm fit just once on the full data set.
Date: 2014
References: View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://hdl.handle.net/10.1080/02664763.2013.864263 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:taf:japsta:v:41:y:2014:i:6:p:1247-1259
Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/CJAS20
DOI: 10.1080/02664763.2013.864263
Access Statistics for this article
Journal of Applied Statistics is currently edited by Robert Aykroyd
More articles in Journal of Applied Statistics from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().