EconPapers    
Economics at your fingertips  
 

Prediction of TOC Content in Organic-Rich Shale Using Machine Learning Algorithms: Comparative Study of Random Forest, Support Vector Machine, and XGBoost

Jiangtao Sun, Wei Dang (), Fengqin Wang, Haikuan Nie, Xiaoliang Wei, Pei Li, Shaohua Zhang, Yubo Feng and Fei Li
Additional contact information
Jiangtao Sun: School of Earth Sciences and Engineering, Xi’an Shiyou University, Xi’an 710065, China
Wei Dang: School of Earth Sciences and Engineering, Xi’an Shiyou University, Xi’an 710065, China
Fengqin Wang: School of Earth Sciences and Engineering, Xi’an Shiyou University, Xi’an 710065, China
Haikuan Nie: Petroleum Exploration and Production Research Institute, SINOPEC, Beijing 100083, China
Xiaoliang Wei: Exploration and Development Institute of Shengli Oilfield Company, SINOPEC, Dongying 257000, China
Pei Li: Petroleum Exploration and Production Research Institute, SINOPEC, Beijing 100083, China
Shaohua Zhang: School of Earth Sciences and Engineering, Xi’an Shiyou University, Xi’an 710065, China
Yubo Feng: School of Earth Sciences and Engineering, Xi’an Shiyou University, Xi’an 710065, China
Fei Li: School of Earth Sciences and Engineering, Xi’an Shiyou University, Xi’an 710065, China

Energies, 2023, vol. 16, issue 10, 1-26

Abstract: The total organic carbon (TOC) content of organic-rich shale is a key parameter in screening for potential source rocks and sweet spots of shale oil/gas. Traditional methods of determining the TOC content, such as the geochemical experiments and the empirical mathematical regression method, are either high cost and low-efficiency, or universally non-applicable and low-accuracy. In this study, we propose three machine learning models of random forest (RF), support vector regression (SVR), and XGBoost to predict the TOC content using well logs, and the performance of each model are compared with the traditional empirical methods. First, the decision tree algorithm is used to identify the optimal set of well logs from a total of 15. Then, 816 data points of well logs and the TOC content data collected from five different shale formations are used to train and test these three models. Finally, the accuracy of three models is validated by predicting the unknown TOC content data from a shale oil well. The results show that the RF model provides the best prediction for the TOC content, with R 2 = 0.915, MSE = 0.108, and MAE = 0.252, followed by the XGBoost, while the SVR gives the lowest predictive accuracy. Nevertheless, all three machine learning models outperform the traditional empirical methods such as Schmoker gamma-ray log method, multiple linear regression method and ΔlgR method. Overall, the proposed machine learning models are powerful tools for predicting the TOC content of shale and improving the oil/gas exploration efficiency in a different formation or a different basin.

Keywords: TOC content; random forest; support vector machine; XGBoost; organic-rich shale (search for similar items in EconPapers)
JEL-codes: Q Q0 Q4 Q40 Q41 Q42 Q43 Q47 Q48 Q49 (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://www.mdpi.com/1996-1073/16/10/4159/pdf (application/pdf)
https://www.mdpi.com/1996-1073/16/10/4159/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jeners:v:16:y:2023:i:10:p:4159-:d:1149629

Access Statistics for this article

Energies is currently edited by Ms. Agatha Cao

More articles in Energies from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jeners:v:16:y:2023:i:10:p:4159-:d:1149629