EconPapers    
Economics at your fingertips  
 

Model-free estimation of completeness, uncertainties, and outliers in atomistic machine learning using information theory

Daniel Schwalbe-Koda (), Sebastien Hamel, Babak Sadigh, Fei Zhou and Vincenzo Lordi ()
Additional contact information
Daniel Schwalbe-Koda: Lawrence Livermore National Laboratory
Sebastien Hamel: Lawrence Livermore National Laboratory
Babak Sadigh: Lawrence Livermore National Laboratory
Fei Zhou: Lawrence Livermore National Laboratory
Vincenzo Lordi: Lawrence Livermore National Laboratory

Nature Communications, 2025, vol. 16, issue 1, 1-13

Abstract: Abstract An accurate description of information is relevant for a range of problems in atomistic machine learning (ML), such as crafting training sets, performing uncertainty quantification (UQ), or extracting physical insights from large datasets. However, atomistic ML often relies on unsupervised learning or model predictions to analyze information contents from simulation or training data. Here, we introduce a theoretical framework that provides a rigorous, model-free tool to quantify information contents in atomistic simulations. We demonstrate that the information entropy of a distribution of atom-centered environments explains known heuristics in ML potential developments, from training set sizes to dataset optimality. Using this tool, we propose a model-free UQ method that reliably predicts epistemic uncertainty and detects out-of-distribution samples, including rare events in systems such as nucleation. This method provides a general tool for data-driven atomistic modeling and combines efforts in ML, simulations, and physical explainability.

Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.nature.com/articles/s41467-025-59232-0 Abstract (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-59232-0

Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/

DOI: 10.1038/s41467-025-59232-0

Access Statistics for this article

Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie

More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-05-01
Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-59232-0