EconPapers    
Economics at your fingertips  
 

Accounting for Spatial Autocorrelation in Algorithm-Driven Hedonic Models: A Spatial Cross-Validation Approach

Juergen Deppner, Marcelo Cajias and Wolfgang Schäfers

ERES from European Real Estate Society (ERES)

Abstract: Aim of research: Real estate markets are featured with a spatial dimension that is pivotal for the economic value of housing. The inherent spatial dependence in the underlying price determination process cannot be simply overlooked in linear hedonic model specifications, as this would render spurious results (see Anselin, 1988; Can and Megbolugbe, 1997; Basu and Thibodeau, 1998). Guidance on how to account for spatial dependence in linear regression models is vast and remains subject of many contributions to the hedonic and spatial econometric literature (see LeSage and Pace, 2009; Anselin, 2010; Elhorst, 2014). Moving from the parametric paradigm of hedonic regression methods to the universe of non-parametric statistical learning methods such as decision trees, random forests, or boosting techniques, literature has brought forth an increasing body of evidence that such algorithms are capable of providing a superior predictive performance for complex non-linear and multi-dimensional regression problems, including various applications to house price estimation (e.g. Mayer et al., 2019; Pace and Hayunga, 2020; Bogin and Shui, 2020). However, in contrast to linear models, little attention has been paid to the implications of spatial dependence in house prices for the statistical validity of error estimates of machine learning algorithms although independence of the data is implicitly assumed (see Roberts et al., 2017; Schratz et al., 2019). Our study aims at investigating the role of spatial autocorrelation (SAC) on the accuracy assessment of algorithmic hedonic methods, thereby benchmarking spatially conscious machine learning approaches to linear and spatial hedonic methods. Study design and methodology: Machine learning algorithms learn the relationship between the response and the regressors autonomously without requiring any a-priori specifications about their functional form. As their high flexibility makes such approaches prone to overfitting, resampling strategies such as k-fold cross validation are applied to approximate a models out-of-sample predictive performance. During resampling, the observations are randomly partitioned into mutually exclusive training and test subsets, whereby the predictor is fitted on the training data and evaluated on the test data. SAC can be accounted for using spatial resampling strategies which attempt to reduce SAC between training and test data through a modification in the splitting process. Instead of randomly partitioning the data which implicitly assumes their independence, spatially clustered partitions are created using the observations coordinates (see Brenning, 2012). We train and evaluate tree-based algorithms on a pooled cross-section of asking rents in Germany using both, random as well as spatial partitioning and subsequently forecast out-of-sample data to assess the bias in the in-sample error estimates associated with SAC. The results are benchmarked to well-specified ordinary least squares and spatial autoregressive frameworks to compare the models generalizability. Originalty and implications: Applying machine learning to spatial data without accounting for SAC provides the predictor with information that is assumed to be unavailable during training, which may lead to biased accuracy assessment (see Lovelace et al., 2021). This study sheds light on the accuracy bias of random resampling induced by SAC in a hedonic context. The results prove useful for increasing the robustness and generalizability of algorithmic approaches to hedonic regression problems, thereby containing valuable implications for appraisal practices. To the best of our knowledge, no research in the existing literature has thus far accounted for SAC in an algorithm-driven hedonic context by applying spatial cross-validation. We conclude that random resampling yields over-optimistic prediction accuracies whereas spatial resampling increases generalizability, and thus robustness to unseen data. We also find the bias to be lower for algorithms which apply column-subsampling to counteract overfitting.

Keywords: Hedonic Models; Machine Learning; Spatial Autocorrelation; Spatial Cross Validation (search for similar items in EconPapers)
JEL-codes: R3 (search for similar items in EconPapers)
Date: 2021-01-01
New Economics Papers: this item is included in nep-big, nep-ecm, nep-isf, nep-ore and nep-ure
References: Add references at CitEc
Citations:

Downloads: (external link)
https://eres.architexturez.net/doc/oai-eres-id-eres2021-51 (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:arz:wpaper:eres2021_51

Access Statistics for this paper

More papers in ERES from European Real Estate Society (ERES) Contact information at EDIRC.
Bibliographic data for series maintained by Architexturez Imprints ().

 
Page updated 2025-03-22
Handle: RePEc:arz:wpaper:eres2021_51