Linear Regression
Vladimir Shikhman () and
David Müller ()
Additional contact information
Vladimir Shikhman: Chemnitz University of Technology
David Müller: Chemnitz University of Technology
Chapter 6 in Mathematical Foundations of Big Data Analytics, 2021, pp 107-129 from Springer
Abstract:
Abstract In statistics, linear regression is the most popular approach to modeling the relationship between an endogenous variable of response and several exogenous variables aiming to explain the former. It is crucial in linear regression to estimate unknown weights put on exogenous variables, in order to obtain the endogenous variable, from the data. The applications of linear regression just in economics are so abundant that all of them are barely to mention. To name a few, we refer to the econometric analysis of relationships between GDP output and unemployment rate, known as Okun’s law, or between price and risk, known as capital asset pricing model. The use of linear regression is twofold. First, after fitting the linear regression it becomes possible to predict the endogenous variable by observing the exogenous variables. Second, the strength of the relationship between the endogenous and exogenous variables can be quantified. In particular, it can be clarified whether some exogenous variables may have no linear relationship with the endogenous variable at all, or identified which subsets of exogenous variables may contain redundant information about the endogenous variable. In this chapter, we discuss the meanwhile classical technique of ordinary least squares for linear regression. The ordinary least squares problem is derived by means of the maximum likelihood estimation, where the error terms are assumed to follow the Gauss distribution. We show that the use of the OLS estimator is favorable from the statistical point of view. Namely, it is a best unbiased linear estimator, as Gauss-Markov theorem says. From the numerical perspective we emphasize that the OLS estimator may suffer instability, especially due to possible multicollinearity in the data. To overcome this obstacle, the ℓ 2-regularization approach is proposed. By following the technique of maximum a posteriori estimation, we thus arrive at the ridge regression. Although biased, the ridge estimator reduces variance, hence, gains computational stability. Finally, we perform stability analysis of the OLS and ridge estimators in terms of the condition number of the underlying data matrix.
Date: 2021
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:sprchp:978-3-662-62521-7_6
Ordering information: This item can be ordered from
http://www.springer.com/9783662625217
DOI: 10.1007/978-3-662-62521-7_6
Access Statistics for this chapter
More chapters in Springer Books from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().