Digital Mapping of Soil pH and Driving Factor Analysis Based on Environmental Variable Screening
He Huang,
Yaolin Liu (),
Yanfang Liu,
Zhaomin Tong,
Zhouqiao Ren and
Yifan Xie
Additional contact information
He Huang: School of Resource and Environmental Science, Wuhan University, Wuhan 430079, China
Yaolin Liu: School of Resource and Environmental Science, Wuhan University, Wuhan 430079, China
Yanfang Liu: School of Resource and Environmental Science, Wuhan University, Wuhan 430079, China
Zhaomin Tong: School of Resource and Environmental Science, Wuhan University, Wuhan 430079, China
Zhouqiao Ren: Institute of Digital Agriculture, Zhejiang Academy of Agricultural Sciences, Hangzhou 310021, China
Yifan Xie: School of Resource and Environmental Science, Wuhan University, Wuhan 430079, China
Sustainability, 2025, vol. 17, issue 7, 1-21
Abstract:
This study comprehensively considers soil formation factors such as land use types, soil types, depths, and geographical conditions in Lanxi City, China. Using multi-source public data, three environmental variable screening methods, the Boruta algorithm, Recursive Feature Elimination (RFE), and Particle Swarm Optimization (PSO), were used to optimize and combine 47 environmental variables for the modeling of soil pH based on the data collected from farmland in the study area in 2022, and their effects were evaluated. A Random Forest (RF) model was used to predict soil pH in the study area. At the same time, Pearson correlation analysis, an environmental variable importance assessment based on the RF model, and SHAP explanatory model were used to explore the main controlling factors of soil pH and reveal its spatial differentiation mechanism. The results showed that in the presence of a large number of environmental variables, the model with covariates selected by PSO before the application of the Random Forest algorithm had higher prediction accuracy than that of Boruta–RF, RFE–RF, and all variable prediction RF models (MAE = 0.496, RMSE = 0.641, R 2 = 0.413, LCCC = 0.508). This indicates that PSO, as a covariate selection method, effectively optimized the input variables for the RF model, enhancing its performance. In addition, the results of the Pearson correlation analysis, RF-model-based environmental variable importance assessment, and SHAP explanatory model consistently indicate that Channel Network Base Level (CNBL), Elevation (DEM), Temperature mean (T_m), Evaporation (E_m), Land surface temperature mean (LST_m), and Humidity mean (H_m) are key factors affecting the spatial differentiation of soil pH. In summary, the approach of using PSO for covariate selection before applying the RF model exhibits high prediction accuracy and can serve as an effective method for predicting the spatial distribution of soil pH, providing important references for accurately simulating the spatial mapping of soil attributes in hilly and basin areas.
Keywords: PSO; environmental variable screening; SHAP; soil pH; digital soil mapping (search for similar items in EconPapers)
JEL-codes: O13 Q Q0 Q2 Q3 Q5 Q56 (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2071-1050/17/7/3173/pdf (application/pdf)
https://www.mdpi.com/2071-1050/17/7/3173/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jsusta:v:17:y:2025:i:7:p:3173-:d:1627255
Access Statistics for this article
Sustainability is currently edited by Ms. Alexandra Wu
More articles in Sustainability from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().