EconPapers    
Economics at your fingertips  
 

When is the first spurious variable selected by sequential regression procedures?

Weijie J Su

Biometrika, 2018, vol. 105, issue 3, 517-527

Abstract: SummaryApplied statisticians use sequential regression procedures to rank explanatory variables and, in settings of low correlations between variables and strong true effect sizes, expect that variables at the top of this ranking are truly relevant to the response. In a regime of certain sparsity levels, however, we show that the lasso, forward stepwise regression, and least angle regression include the first spurious variable unexpectedly early. We derive a sharp prediction of the rank of the first spurious variable for these three procedures, demonstrating that it occurs earlier and earlier as the regression coefficients become denser. This phenomenon persists for statistically independent Gaussian random designs and arbitrarily large true effects. We gain insight by identifying the underlying cause and then introduce a simple visualization tool termed the double-ranking diagram to improve on these methods. We obtain the first result establishing the exact equivalence between the lasso and least angle regression in the early stages of solution paths beyond orthogonal designs. This equivalence implies that many important model selection results concerning the lasso can be carried over to least angle regression.

Keywords: False variable; Familywise error rate; Forward stepwise regression; Lasso; Least angle regression (search for similar items in EconPapers)
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://hdl.handle.net/10.1093/biomet/asy032 (application/pdf)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:oup:biomet:v:105:y:2018:i:3:p:517-527.

Ordering information: This journal article can be ordered from
https://academic.oup.com/journals

Access Statistics for this article

Biometrika is currently edited by Paul Fearnhead

More articles in Biometrika from Biometrika Trust Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, UK.
Bibliographic data for series maintained by Oxford University Press ().

 
Page updated 2025-03-19
Handle: RePEc:oup:biomet:v:105:y:2018:i:3:p:517-527.