Sparse Linear Models and Two-Stage Estimation in High-Dimensional Settings with Possibly Many Endogenous Regressors

Zhu, Ying

Sparse Linear Models and Two-Stage Estimation in High-Dimensional Settings with Possibly Many Endogenous Regressors

Ying Zhu

MPRA Paper from University Library of Munich, Germany

Abstract: This paper explores the validity of the two-stage estimation procedure for sparse linear models in high-dimensional settings with possibly many endogenous regressors. In particular, the number of endogenous regressors in the main equation and the instruments in the first-stage equations can grow with and exceed the sample size n. The analysis concerns the exact sparsity case, i.e., the maximum number of non-zero components in the vectors of parameters in the first-stage equations, k1, and the number of non-zero components in the vector of parameters in the second-stage equation, k2, are allowed to grow with n but slowly compared to n. I consider the high-dimensional version of the two-stage least square estimator where one obtains the fitted regressors from the first-stage regression by a least square estimator with l_1-regularization (the Lasso or Dantzig selector) when the first-stage regression concerns a large number of instruments relative to n, and then construct a similar estimator using these fitted regressors in the second-stage regression. The main theoretical results of this paper are non-asymptotic bounds from which I establish sufficient scaling conditions on the sample size for estimation consistency in l_2-norm and variable-selection consistency (i.e., the two-stage high-dimensional estimators correctly select the non-zero coefficients in the main equation with high probability). A technical issue regarding the so-called "restricted eigenvalue (RE) condition" for estimation consistency and the "mutual incoherence (MI) condition" for selection consistency arises in the two-stage estimation from allowing the number of regressors in the main equation to exceed n and this paper provides analysis to verify these RE and MI conditions. Depending on the underlying assumptions that are imposed, the upper bounds on the l_2-error and the sample size required to obtain these consistency results differ by factors involving k1 and/or k2. Simulations are conducted to gain insight on the finite sample performance of the high-dimensional two-stage estimator.

Keywords: High-dimensional statistics; Lasso; sparse linear models; endogeneity; two-stage estimation (search for similar items in EconPapers)
JEL-codes: C1 C13 C31 C36 (search for similar items in EconPapers)
Date: 2013-09-17
New Economics Papers: this item is included in nep-ecm
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://mpra.ub.uni-muenchen.de/49846/1/MPRA_paper_49846.pdf original version (application/pdf)
https://mpra.ub.uni-muenchen.de/65703/7/MPRA_paper_65703.pdf revised version (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:pra:mprapa:49846

Access Statistics for this paper

More papers in MPRA Paper from University Library of Munich, Germany Ludwigstraße 33, D-80539 Munich, Germany. Contact information at EDIRC.
Bibliographic data for series maintained by Joachim Winter ().