Data-driven organic solubility prediction at the limit of aleatoric uncertainty
Lucas Attia,
Jackson W. Burns,
Patrick S. Doyle and
William H. Green ()
Additional contact information
Lucas Attia: MIT
Jackson W. Burns: MIT
Patrick S. Doyle: MIT
William H. Green: MIT
Nature Communications, 2025, vol. 16, issue 1, 1-10
Abstract:
Abstract Small molecule solubility is a critically important property which affects the efficiency, environmental impact, and phase behavior of synthetic processes. Experimental determination of solubility is a time- and resource-intensive process and existing methods for in silico estimation of solubility are limited by their generality, speed, and accuracy. This work presents two models derived from the FASTPROP and CHEMPROP architectures and trained on BigSolDB which are capable of predicting solubility at arbitrary temperatures for a wide range of small molecules in organic solvent. Both extrapolate to unseen solutes 2–3 times more accurately than the current state-of-the-art model and we demonstrate that they are approaching the aleatoric limit (0.5–1 $$\log S$$ log S ) of available test data, suggesting that further improvements in prediction accuracy require more accurate datasets. The FASTPROP-derived model (called FASTSOLV) and the CHEMPROP-based model are open source, freely accessible via a Python package and web interface, highly reproducible, and up to 2 orders of magnitude faster than current alternatives.
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.nature.com/articles/s41467-025-62717-7 Abstract (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-62717-7
Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/
DOI: 10.1038/s41467-025-62717-7
Access Statistics for this article
Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie
More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().