EconPapers    
Economics at your fingertips  
 

A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum-L1-Norm Interpolated Classifiers

Tengyuan Liang () and Pragya Sur ()
Additional contact information
Tengyuan Liang: University of Chicago - Booth School of Business
Pragya Sur: Harvard University - Department of Statistics

No 2020-152, Working Papers from Becker Friedman Institute for Research In Economics

Abstract: This paper establishes a precise high-dimensional asymptotic theory for boosting on separable data, taking statistical and computational perspectives. We consider the setting where the number of features (weak learners) p scales with the sample size n, in an over-parametrized regime. Under a broad class of statistical models, we provide an exact analysis of the generalization error of boosting, when the algorithm interpolates the training data and maximizes the empirical L1-margin. The relation between the boosting test error and the optimal Bayes error is pinned down explicitly. In turn, these precise characterizations resolve several open questions raised in [15, 81] surrounding boosting. On the computational front, we provide a sharp analysis of the stopping time when boosting approximately maximizes the empirical L1 margin. Furthermore, we discover that the larger the overparametrization ratio p/n, the smaller the proportion of active features (with zero initialization), and the faster the optimization reaches interpolation. At the heart of our theory lies an in-depth study of the maximum L1-margin, which can be accurately described by a new system of non-linear equations; we analyze this margin and the properties of this system, using Gaussian comparison techniques and a novel uniform deviation argument. Variants of AdaBoost corresponding to general Lq geometry, for q > 1, are also presented, together with an exact analysis of the high-dimensional generalization and optimization behavior of a class of these algorithms.

Pages: 49 pages
Date: 2020
New Economics Papers: this item is included in nep-cmp and nep-ecm
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://repec.bfi.uchicago.edu/RePEc/pdfs/BFI_WP_2020152.pdf (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bfi:wpaper:2020-152

Access Statistics for this paper

More papers in Working Papers from Becker Friedman Institute for Research In Economics Contact information at EDIRC.
Bibliographic data for series maintained by Toni Shears ( this e-mail address is bad, please contact ).

 
Page updated 2025-03-22
Handle: RePEc:bfi:wpaper:2020-152