Creating a general-purpose generative model for healthcare data based on multiple clinical studies
Hiroshi Maruyama,
Kotatsu Bito,
Yuki Saito,
Masanobu Hibi,
Shun Katada,
Aya Kawakami,
Kenta Oono,
Nontawat Charoenphakdee,
Zhengyan Gao,
Hideyoshi Igata,
Masashi Yoshikawa,
Yoshiaki Ota,
Hiroki Okui,
Kei Akita,
Shoichiro Yamaguchi,
Yohei Sugawara and
Shin-ichi Maeda
PLOS Digital Health, 2025, vol. 4, issue 11, 1-25
Abstract:
Data for healthcare applications are typically customized for specific purposes but are often difficult to access due to high costs and privacy concerns. Rather than prepare separate datasets for individual applications, we propose a novel approach: building a general-purpose generative model applicable to virtually any type of healthcare application. This generative model encompasses a broad range of human attributes, including age, sex, anthropometric measurements, blood components, physical performance metrics, and numerous healthcare-related questionnaire responses. To achieve this goal, we integrated the results of multiple clinical studies into a unified training dataset and developed a generative model to replicate its characteristics. The model can estimate missing attribute values from known attribute values and generate synthetic datasets for various applications. Our analysis confirmed that the model captures key statistical properties of the training dataset, including univariate distributions and bivariate relationships. We demonstrate the model’s practical utility through multiple real-world applications, illustrating its potential impact on predictive, preventive, and personalized medicine.Author summary: Digital technologies are expected to revolutionize healthcare, yet digital healthcare has not reached its full potential. A major bottleneck is the poor data availability. Due to concerns regarding privacy and cost, healthcare data is very difficult to access. Here, our aim was to provide a general-purpose statistical model that can be used in place of actual data. Recent advancements in machine-learning technology, especially in generative models, make this challenging goal possible. We built a model that captures complex statistical interactions among more than 2000 human attributes and made it available as a software service on the Internet. The model can be used for estimating unknown attributes from known attributes and generating synthetic data. We believe that this model significantly lowers the barrier to entry into digital healthcare and will stimulate future innovations.
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0001059 (text/html)
https://journals.plos.org/digitalhealth/article/fi ... 01059&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pdig00:0001059
DOI: 10.1371/journal.pdig.0001059
Access Statistics for this article
More articles in PLOS Digital Health from Public Library of Science
Bibliographic data for series maintained by digitalhealth ().