Spatial Prediction of Soil Organic Carbon Based on a Multivariate Feature Set and Stacking Ensemble Algorithm: A Case Study of Wei-Ku Oasis in China
Zuming Cao,
Xiaowei Luo,
Xuemei Wang () and
Dun Li
Additional contact information
Zuming Cao: College of Geographic Science and Tourism, Xinjiang Normal University, Urumqi 830017, China
Xiaowei Luo: College of Geographic Science and Tourism, Xinjiang Normal University, Urumqi 830017, China
Xuemei Wang: College of Geographic Science and Tourism, Xinjiang Normal University, Urumqi 830017, China
Dun Li: College of Geographic Science and Tourism, Xinjiang Normal University, Urumqi 830017, China
Sustainability, 2025, vol. 17, issue 13, 1-25
Abstract:
Accurate estimation of soil organic carbon (SOC) content is crucial for assessing terrestrial ecosystem carbon stocks. Although traditional methods offer relatively high estimation accuracy, they are limited by poor timeliness and high costs. Combining measured data, remote sensing technology, and machine learning (ML) algorithms enables rapid, efficient, and accurate large-scale prediction. However, single ML models often face issues like high feature variable redundancy and weak generalization ability. Integrated models can effectively overcome these problems. This study focuses on the Weigan–Kuqa River oasis (Wei-Ku Oasis), a typical arid oasis in northwest China. It integrates Sentinel-2A multispectral imagery, a digital elevation model, ERA5 meteorological reanalysis data, soil attribute, and land use (LU) data to estimate SOC. The Boruta algorithm, Lasso regression, and its combination methods were used to screen feature variables, constructing a multidimensional feature space. Ensemble models like Random Forest (RF), Gradient Boosting Machine (GBM), and the Stacking model are built. Results show that the Stacking model, constructed by combining the screened variable sets, exhibited optimal prediction accuracy (test set R 2 = 0.61, RMSE = 2.17 g∙kg −1 , RPD = 1.61), which reduced the prediction error by 9% compared to single model prediction. Difference Vegetation Index (DVI), Bare Soil Evapotranspiration (BSE), and type of land use (TLU) have a substantial multidimensional synergistic influence on the spatial differentiation pattern of the SOC. The implementation of TLU has been demonstrated to exert a substantial influence on the model’s estimation performance, as evidenced by an augmentation of 24% in the R 2 of the test set. The integration of Boruta–Lasso combination screening and Stacking has been shown to facilitate the construction of a high-precision SOC content estimation model. This model has the capacity to provide technical support for precision fertilization in oasis regions in arid zones and the management of regional carbon sinks.
Keywords: land use; Boruta–Lasso algorithm; random forest; gradient boosting machine; stacking integration; soil organic carbon (search for similar items in EconPapers)
JEL-codes: O13 Q Q0 Q2 Q3 Q5 Q56 (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2071-1050/17/13/6168/pdf (application/pdf)
https://www.mdpi.com/2071-1050/17/13/6168/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jsusta:v:17:y:2025:i:13:p:6168-:d:1695199
Access Statistics for this article
Sustainability is currently edited by Ms. Alexandra Wu
More articles in Sustainability from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().