EconPapers    
Economics at your fingertips  
 

Parallel cross-validation: A scalable fitting method for Gaussian process models

Florian Gerber and Douglas W. Nychka

Computational Statistics & Data Analysis, 2021, vol. 155, issue C

Abstract: Gaussian process (GP) models are widely used to analyze spatially referenced data and to predict values at locations without observations. They are based on a statistical framework, which enables uncertainty quantification of the model structure and predictions. Both the evaluation of the likelihood and the prediction involve solving linear systems. Hence, the computational costs are large and limit the amount of data that can be handled. While there are many approximation strategies that lower the computational cost of GP models, they often provide sub-optimal support for the parallel computing capabilities of (high-performance) computing environments. To bridge this gap a parallelizable parameter estimation and prediction method is presented. The key idea is to divide the spatial domain into overlapping subsets and to use cross-validation (CV) to estimate the covariance parameters in parallel. Although simulations show that CV is less effective for parameter estimation than the maximum likelihood method, it is amenable to parallel computing and enables the handling of large datasets. Exploiting the screen effect for spatial prediction helps to arrive at a spatial analysis that is close to a global computation despite performing parallel computations on local regions. Simulation studies assess the accuracy of the parameter estimates and predictions. The implementation shows good weak and strong parallel scaling properties. For illustration, an exponential covariance model is fitted to a scientifically relevant canopy height dataset with 5 million observations. Using 512 processor cores in parallel brings the evaluation time of one covariance parameter configuration to 1.5 minutes.

Keywords: Cross-validation; Gaussian random fields; High-performance computing; Kriging; Spatial statistics (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0167947320302048
Full text for ScienceDirect subscribers only.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:csdana:v:155:y:2021:i:c:s0167947320302048

DOI: 10.1016/j.csda.2020.107113

Access Statistics for this article

Computational Statistics & Data Analysis is currently edited by S.P. Azen

More articles in Computational Statistics & Data Analysis from Elsevier
Bibliographic data for series maintained by Catherine Liu ().

 
Page updated 2025-03-19
Handle: RePEc:eee:csdana:v:155:y:2021:i:c:s0167947320302048