EconPapers    
Economics at your fingertips  
 

A random forests-based hedonic price model accounting for spatial autocorrelation

Emre Tepe ()
Additional contact information
Emre Tepe: University of Florida

Journal of Geographical Systems, 2024, vol. 26, issue 4, No 5, 540 pages

Abstract: Abstract This paper introduces a spatially explicit random forests-based hedonic price modeling approach to account for spatial autocorrelation in the data. Spatial autocorrelation is a common data structure in georeferenced data, and controlling associations among spatial objects is crucial for accurate statistical analysis. Validations of machine learning and artificial intelligence applications require using out-of-sample data sets to assess models’ fit on the training dataset. Previous research has shown that nonspatial cross-validation methods, commonly used in machine learning applications for spatial data, often provide over-optimistic results. Some recommended the use of spatial cross-validation methods to obtain more reliable estimates. However, the machine learning models used in these previous studies did not include spatially explicit parameters to account for spatial autocorrelation in the data. Unlike machine learning-based models, statistical-based models such as the spatial lag model can effectively account for spatial autocorrelation in the data. This research applied a two-stage least squares random forests framework to construct a hedonic pricing model incorporating a spatial lag for the Miami-Dade single-family residential parcel sales data. Random forests models are evaluated using K-fold, spatial blocking K-fold, and spatial leave-one-out cross-validation methods. The goodness-of-fit of the tested random forests-based models is evaluated using the coefficient of determination and mean square error scores. Additionally, spatial autocorrelations in residuals from random forests models are investigated by conducting Moran’s I test. Our research indicates that failing to account for spatial autocorrelation in data can lead to unreliable and overly optimistic estimates. However, including a spatially lagged variable substantially reduces fluctuations in goodness-of-fit measures across different validation sets.

Keywords: Two-stage least squares random forests; Moran’s I test; Hedonic price modeling; Spatial blocking K-fold; Spatial leave-one-out test (search for similar items in EconPapers)
JEL-codes: C14 C21 C26 C52 R31 (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s10109-024-00449-w Abstract (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:kap:jgeosy:v:26:y:2024:i:4:d:10.1007_s10109-024-00449-w

Ordering information: This journal article can be ordered from
http://www.springer. ... ce/journal/10109/PS2

DOI: 10.1007/s10109-024-00449-w

Access Statistics for this article

Journal of Geographical Systems is currently edited by Manfred M. Fischer and Antonio Páez

More articles in Journal of Geographical Systems from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-19
Handle: RePEc:kap:jgeosy:v:26:y:2024:i:4:d:10.1007_s10109-024-00449-w