A water solubility prediction algorithm based on the StackBoost model
Bin Pan,
Xiaoyu Hou,
Mingxin Zhang,
Jingxian Yu,
Conghui Zhang,
Yunhui Zhang,
Xiaolong Su and
Shuangcai Li
PLOS ONE, 2025, vol. 20, issue 8, 1-18
Abstract:
Aqueous solubility, an essential physical property of compounds, has significant applications across various fields. However, verifying the solubility of compounds through experimental methods often requires substantial human and material resources. To address this issue, this study introduces the StackBoost model for predicting the solubility of organic compounds and systematically compares it with five well-known ensemble learning algorithms: Adaptive Boosting (AdaBoost), Gradient Boosted Regression Trees (GBRT), Light Gradient Boosting Machine (LGBM), Extreme Gradient Boosting (XGBoost), and Random Forest (RF). The prediction results indicate that the StackBoost model excels in predicting aqueous solubility, achieving a coefficient of determination (R2) of 0.90, a root mean square error (RMSE) of 0.29, and a mean absolute error (MAE) of 0.22, significantly outperforming the other comparative models. Furthermore, this study further conducted high-throughput screening on large-scale datasets and successfully identified compounds with high potential for water solubility. Additionally, the model’s generalization ability is verified through transfer learning. Although the performance of the StackBoost model decreases when applied to different datasets, it still shows considerable transferability, making it a more generalizable prediction model for aqueous solubility.
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0330598 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 30598&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0330598
DOI: 10.1371/journal.pone.0330598
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().