Generative data modelling for diverse populations in Africa: insights from South Africa
Sally Sonia Simmons,
John Elvis Hagan and
Thomas Schack
LSE Research Online Documents on Economics from London School of Economics and Political Science, LSE Library
Abstract:
Studies on the demography and health of racially diverse African populations are scarce, particularly due to lingering data challenges. Generative data modelling has emerged as a valuable solution to this burden. The study, therefore, examined the efficacy of Conditional Tabular GAN (CTGAN), CopulaGAN, and Tabula Variational Autoencoder (TVAE) for generating synthetic but realistic demographic and health data. This study employed the World Health Organisation stigy on global ageing and adult health survey (SAGE) Wave 1 South African data (n = 4227). Information missing from SAGE Wave 1, including demographic (e.g., race, age) and health (e.g., hypertension, blood pressure) indicators, were imputed using Generative Adversarial Imputation Nets (GAIN). CopulaGAN, CTGAN, and TVAE, sourced from the sdv 1.24.1 python library, generated 104,227 synthetic records based on the SAGE data constituents. The outcomes were accessed with similarity and machine learning (XGBoost) augmentation metrics (sourced from the sdmetrics 0.21.0 python library), including column shapes and overall and precision ratio scores. Generally, the GAIN imputations resulted in data with properties that were comparable to original and with no missing information. CTGAN’s (89.20%) overall quality of performance was above that of TVAE (86.50%) and CopulaGAN (88.45%). These findings underscore the usefulness of generative data modelling in addressing data quality challenges in diverse populations to enhance actionable health research and policy implementation.
Keywords: Africa; CTGAN; CopulaGAN; GAIN; South Africa; TVAE; deomography; generative data modelling; synthetic data (search for similar items in EconPapers)
JEL-codes: C63 C81 I10 J11 (search for similar items in EconPapers)
Pages: 17 pages
Date: 2025-07-17
References: Add references at CitEc
Citations:
Published in Information, 17, July, 2025, 16(7). ISSN: 2078-2489
Downloads: (external link)
http://eprints.lse.ac.uk/129032/ Open access version. (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ehl:lserod:129032
Access Statistics for this paper
More papers in LSE Research Online Documents on Economics from London School of Economics and Political Science, LSE Library LSE Library Portugal Street London, WC2A 2HD, U.K.. Contact information at EDIRC.
Bibliographic data for series maintained by LSERO Manager ().