An Exploration of Regression-Based Data Mining Techniques Using Super Computation
Antony Davies
No 2008-008, Working Papers from The George Washington University, Department of Economics, H. O. Stekler Research Program on Forecasting
Abstract:
Regression analysis is intended to be used when the researcher seeks to test a given hypothesis against a data set. Unfortunately, in many applications it is either not possible to specify a hypothesis, typically because the research is in a very early stage, or it is not desirable to form a hypothesis, typically because the number of potential explanatory variables is very large. In these cases, researchers have resorted either to overt data mining techniques such as stepwise regression, or covert data mining techniques such as running variations on regression models prior to running the final model (also known as “data peeking”). While data mining side-steps the need to form a hypothesis, it is highly susceptible to generating spurious results. This paper draws on the known properties of OLS estimators in the presence of omitted and extraneous variable models to propose a procedure for data mining that attempts to distinguish between parameter estimates that are significant due to an underlying structural relationship and those that are significant due to random chance.
Keywords: exhaustive; regression; all subsets; stepwise; data mining (search for similar items in EconPapers)
JEL-codes: C10 C40 C63 (search for similar items in EconPapers)
Pages: 36 pages
Date: 2008-08
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www2.gwu.edu/~forcpgm/2008-008.pdf First version, 2008 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gwc:wpaper:2008-008
Access Statistics for this paper
More papers in Working Papers from The George Washington University, Department of Economics, H. O. Stekler Research Program on Forecasting Contact information at EDIRC.
Bibliographic data for series maintained by GW Economics Department ().