Model-free estimation of completeness, uncertainties, and outliers in atomistic machine learning using information theory
Daniel Schwalbe-Koda (),
Sebastien Hamel,
Babak Sadigh,
Fei Zhou and
Vincenzo Lordi ()
Additional contact information
Daniel Schwalbe-Koda: Lawrence Livermore National Laboratory
Sebastien Hamel: Lawrence Livermore National Laboratory
Babak Sadigh: Lawrence Livermore National Laboratory
Fei Zhou: Lawrence Livermore National Laboratory
Vincenzo Lordi: Lawrence Livermore National Laboratory
Nature Communications, 2025, vol. 16, issue 1, 1-13
Abstract:
Abstract An accurate description of information is relevant for a range of problems in atomistic machine learning (ML), such as crafting training sets, performing uncertainty quantification (UQ), or extracting physical insights from large datasets. However, atomistic ML often relies on unsupervised learning or model predictions to analyze information contents from simulation or training data. Here, we introduce a theoretical framework that provides a rigorous, model-free tool to quantify information contents in atomistic simulations. We demonstrate that the information entropy of a distribution of atom-centered environments explains known heuristics in ML potential developments, from training set sizes to dataset optimality. Using this tool, we propose a model-free UQ method that reliably predicts epistemic uncertainty and detects out-of-distribution samples, including rare events in systems such as nucleation. This method provides a general tool for data-driven atomistic modeling and combines efforts in ML, simulations, and physical explainability.
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.nature.com/articles/s41467-025-59232-0 Abstract (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-59232-0
Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/
DOI: 10.1038/s41467-025-59232-0
Access Statistics for this article
Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie
More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().