Large‐scale environmental data science with ExaGeoStatR
Sameh Abdulah,
Yuxiao Li,
Jian Cao,
Hatem Ltaief,
David E. Keyes,
Marc G. Genton and
Ying Sun
Environmetrics, 2023, vol. 34, issue 1
Abstract:
Parallel computing in exact Gaussian process (GP) calculations becomes necessary for avoiding computational and memory restrictions associated with large‐scale environmental data science applications. The exact evaluation of the Gaussian log‐likelihood function requires O(n2) storage and O(n3) operations, where n is the number of geographical locations. Thus, exactly computing the log‐likelihood function with a large number of locations requires exploiting the power of existing parallel computing hardware systems, such as shared‐memory, possibly equipped with GPUs, and distributed‐memory systems, to solve this exact computational complexity. In this article, we present ExaGeoStatR, a package for exascale geostatistics in R that supports a parallel computation of the exact maximum likelihood function on a wide variety of parallel architectures. Furthermore, the package allows scaling existing GP methods to a large spatial/temporal domain. Prohibitive exact solutions for large geostatistical problems become possible with ExaGeoStatR. Parallelization in ExaGeoStatR depends on breaking down the numerical linear algebra operations in the log‐likelihood function into a set of tasks and rendering them for a task‐based programming model. The package can be used directly through the R environment on parallel systems without the user needing any C, CUDA, or MPI knowledge. Currently, ExaGeoStatR supports several maximum likelihood computation variants such as exact, diagonal super tile and tile low‐rank approximations, and mixed‐precision. ExaGeoStatR also provides a tool to simulate large‐scale synthetic datasets. These datasets can help assess different implementations of the maximum log‐likelihood approximation methods. Herein, we show the implementation details of ExaGeoStatR, analyze its performance on various parallel architectures, and assess its accuracy using synthetic datasets with up to 250K observations. The experimental analysis covers the exact computation of ExaGeoStatR to demonstrate the parallel capabilities of the package. We provide a hands‐on tutorial to analyze a sea surface temperature real dataset. The performance evaluation involves comparisons with the popular packages GeoR, fields, and bigGP for exact Gaussian likelihood evaluation. The approximation methods in ExaGeoStatR are not considered in this article since they were analyzed in previous studies.
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://doi.org/10.1002/env.2770
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:wly:envmet:v:34:y:2023:i:1:n:e2770
Ordering information: This journal article can be ordered from
http://www.blackwell ... bs.asp?ref=1180-4009
Access Statistics for this article
More articles in Environmetrics from John Wiley & Sons, Ltd.
Bibliographic data for series maintained by Wiley Content Delivery ().