Missing data in amortized simulation-based neural posterior estimation

Wang, Zijian; Hasenauer, Jan; Schälte, Yannik

Missing data in amortized simulation-based neural posterior estimation

Zijian Wang, Jan Hasenauer and Yannik Schälte

PLOS Computational Biology, 2024, vol. 20, issue 6, 1-17

Abstract: Amortized simulation-based neural posterior estimation provides a novel machine learning based approach for solving parameter estimation problems. It has been shown to be computationally efficient and able to handle complex models and data sets. Yet, the available approach cannot handle the in experimental studies ubiquitous case of missing data, and might provide incorrect posterior estimates. In this work, we discuss various ways of encoding missing data and integrate them into the training and inference process. We implement the approaches in the BayesFlow methodology, an amortized estimation framework based on invertible neural networks, and evaluate their performance on multiple test problems. We find that an approach in which the data vector is augmented with binary indicators of presence or absence of values performs the most robustly. Indeed, it improved the performance also for the simpler problem of data sets with variable length. Accordingly, we demonstrate that amortized simulation-based inference approaches are applicable even with missing data, and we provide a guideline for their handling, which is relevant for a broad spectrum of applications.Author summary: In biomedical research, mechanistic models describe dynamic processes, yet inferring their underlying parameters can often be challenging. Bayesian statistics provides an established framework for this by integrating prior knowledge with observed data, and naturally enables uncertainty quantification as a distribution of parameter values is returned. However, classical case-based methods for Bayesian inference can be computationally expensive, particularly when the same model needs to be fitted to different data sets. Recently, deep-learning-based approaches have been developed to streamline the inference procedure, allowing the upfront training cost to amortize when applied to multiple data sets. In this manuscript, we explore approaches to extend the setup to data sets with missing data. In summary, an encoding scheme which exploits data augmentation with binary indicators of presence or absence performs the most robustly across different test problems.

Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012184 (text/html)
https://journals.plos.org/ploscompbiol/article?id= ... 12184&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1012184

DOI: 10.1371/journal.pcbi.1012184

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().