Specification Search and Stability Analysis
J. Guillermo Llorente () and
J. del Hoyo ()
Additional contact information
J. Guillermo Llorente: Universidad Autonoma de Madrid
J. del Hoyo: Universidad Autonoma de Madrid
No 642, Computing in Economics and Finance 1999 from Society for Computational Economics
Abstract:
Specification analysis precedes model selection for structural analysis or forecasting. To explain a variable, one chooses an optimal subset of k predictors among m indicated variables, often maximizing some goodness of fit or R^2 (or F ). Without such a process, one has potentially misleading data mining. Foster et al. (1997) use maximum R^2 to for this purpose. They feel proper cut-off points of the R^2 distribution require consideration of the selection procedure and hence the use of the distribution function of the maximal R^2 . This difficult function must either be simulated by Monte Carlo or approximated as in Foster et al. with Bonferroni or Rencher and Pun bounds. White (1997) proposes using a 'Reality Check,' comparing forecasting performance of the candidate against a benchmark. Out-of-sample prediction is a good performance test, but choosing the benchmark model is more difficult. Surprisingly the full sample is not often exploited in testing for data mining. We argue that testing with both full sample and recursive estimation along the sample reduces data mining problems. Before accepting a model with significant global R^2 , it is of use to test for coefficient stability and significance of R^2 along the full sample. A sound theoretical model should remain valid if estimated and tested recursively. Foster et al. use R^2 estimated with the full sample. But models may comply with maximal R^2 statistics and be spurious (nonconstant coefficients). We propose to consider the information from the recursive estimations to detect this situation. We add to the processes of model selection and data mining possible parameter variation, which can bias the choice of benchmark model or the specification search among the m variables. Time-varying parameters (TVP) that are assumed constant produce misspecification error, possibly contaminating subsequent analyses. Thus, del Hoyo and Llorente (1998a) study the improvement in forecasting arising by considering non constant parameters. We consider both means (discrimination and stability) for decreasing biases in choosing a model. The first stage uses the R^2 or R^2_{max} to select the optimal explanatory variables. The second stage tests stability and constancy of the relationship. The conditional distributions of the recursive statistics are tabulated, conditional on the discrimination stage. The innovation here is the sequential consideration of both procedures. Section 1 introduces the problem. Section 2 tabulates the distributions of the relevant statistics, and their size and power are considered. Section 3 introduces the sequential procedure described above. The conditional distributions are studied. Section 5 gives an illustration with a model proposed by Campbell, Grossman and Wang (1993). Section 6 concludes.
Date: 1999-03-01
New Economics Papers: this item is included in nep-ecm
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://fmwww.bc.edu/cef99/papers/llorente.pdf main text (application/pdf)
Our link check indicates that this URL is bad, the error code is: 404 Not Found
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:sce:scecf9:642
Access Statistics for this paper
More papers in Computing in Economics and Finance 1999 from Society for Computational Economics CEF99, Boston College, Department of Economics, Chestnut Hill MA 02467 USA. Contact information at EDIRC.
Bibliographic data for series maintained by Christopher F. Baum ().