Synthetic Data as a Proxy for Real-World Electronic Health Records in the Patient Length of Stay Prediction
Dominik Bietsch,
Robert Stahlbock and
Stefan Voß ()
Additional contact information
Dominik Bietsch: Institute of Information Systems, University of Hamburg, Von-Melle-Park 5, 20146 Hamburg, Germany
Robert Stahlbock: Institute of Information Systems, University of Hamburg, Von-Melle-Park 5, 20146 Hamburg, Germany
Stefan Voß: Institute of Information Systems, University of Hamburg, Von-Melle-Park 5, 20146 Hamburg, Germany
Sustainability, 2023, vol. 15, issue 18, 1-30
Abstract:
While generative artificial intelligence has gained popularity, e.g., for the creation of images, it can also be used for the creation of synthetic tabular data. This bears great potential, especially for the healthcare industry, where data are often scarce and underlie privacy restrictions. For instance, the creation of synthetic electronic health records (EHR) promises to improve the usage of machine learning algorithms, which usually work with large amounts of data. This also applies for the prediction of the patient length of stay (LOS), a key measure for hospitals. Thereby, the LOS represents one of the core tools for decision makers to plan the allocation of resources. Thus, this paper aims to add to the still-young research concerning the application of generative adversarial nets (GAN) on tabular EHR. It does that with the intention to leverage the advantages of synthetic data for the prediction of the LOS in order to contribute to the efficiency-enhancing and cost-saving aspirations of hospitals and insurance companies. Therefore, the applicability of synthetic data that is generated using GANs as a proxy for scarce real-world EHR for the patient LOS multi-class classification task is examined. In this context, the Conditional Tabular GAN (CTGAN) and the Copula GAN are selected as the underlying models as they are state-of-the-art GAN architectures designed for generating synthetic tabular data. The CTGAN is found to be the superior model for the underlying use case. Nevertheless, the paper shows that there is still room for improvement when applying state-of-the-art GAN architectures to clinical healthcare data.
Keywords: generative artificial intelligence; synthetic tabular data; healthcare industry; synthetic electronic health records (EHR); patient length of stay (LOS) (search for similar items in EconPapers)
JEL-codes: O13 Q Q0 Q2 Q3 Q5 Q56 (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2071-1050/15/18/13690/pdf (application/pdf)
https://www.mdpi.com/2071-1050/15/18/13690/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jsusta:v:15:y:2023:i:18:p:13690-:d:1239261
Access Statistics for this article
Sustainability is currently edited by Ms. Alexandra Wu
More articles in Sustainability from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().