XTSEL: Selection of variables and specification in a panel-data framework
Alfonso Ugarte-Ruiz
2020 Stata Conference from Stata Users Group
Abstract:
We have developed two new commands that allow selecting the best predictor between a number of alternative explanatory variables (xtselvar) and the best specification between all possible combinations of a defined set of explanatory variables (xtselmod) in a panel-data framework. xtselvar helps us to select the best predictor between a number of alternative explanatory variables (candidates). The procedure estimates the same specification n times, keeping constant the same dependent variable and an optional list of control variables. However, at each repetition, the procedure includes only one of the n-candidate variables in the specification, (in addition to the list of fixed control variables) until each one of the candidate variables listed by the user in the syntax has been included and evaluated. For each candidate variable, the procedure estimates seven parameters and statistical criteria. xtselmod helps us to select the best specification between all possible combinations of a defined set of explanatory variables. It is closely related to the command xtselvar and relies heavily on the Stata command tuples. Given n possible explanatory variables, the procedure estimates 2^n - 1 different specifications, one per each possible combination. Then, for each one of them, the procedure estimates a set of five statistical criteria. More specifically, xtselvar estimates seven statistics per variable (Coefficient, t-statistic, Adj. R2, AIC, BIC, U-Theil in time-series, U-Theil in cross-individual), while xtselmod estimates only the last five per specification. The procedures then rank each variable or specification according to those last five statistical criteria and generate one ranking for each one of them. It also computes a composite ranking summarizing all five criteria. It finally sorts all candidate variables or specifications according to the selected ranking, which by default is the composite ranking. The out-of-sample evaluation of each candidate variable and specification is performed based on the commands xtoos_t and xtoos_i, which need to be installed in Stata to be able to execute the procedures. xtselvar and xtselmod allow one to choose weights for each one of the five criteria used to compute the composite ranking. They also allow one to rank the variables and specifications according to a specific criterion of preference. For instance, if the primary objective of the estimation is to obtain the most accurate prediction of the dependent variable, one could choose to rank the candidate variables and specification according only to their forecasting ability, that is, according to the estimated U-Theil in its time-series dimension. The procedures allow one to choose different estimation methods, including some dynamic methodologies, and could also be used in a dataset with only time-series observations. When the specification includes lags of the dependent variable, the procedure is able to automatically generate dynamic forecasts for the out-of-sample evaluation performance. In the case of the out-of-sample evaluation in a time-series dimension, they allow one to choose a specific horizon h at which to evaluate the forecasting performance of the model, including the candidate variable and specification. It also allows one to estimate the forecasting performance from horizon t 1 until t h. xtselmod adjusts the Stata command tuples so that it allows time-series operators like lags, leads, and differences. Importantly, it also allows one to choose and use the conditionals option of the command tuples, using the same structure and syntax.
Date: 2020-08-20
References: Add references at CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://fmwww.bc.edu/repec/scon2020/us20_Ugarte-Ruiz.pdf
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:boc:scon20:16
Access Statistics for this paper
More papers in 2020 Stata Conference from Stata Users Group Contact information at EDIRC.
Bibliographic data for series maintained by Christopher F Baum ().