A comprehensive investigation of variational auto-encoders for population synthesis

Sané, Abdoul Razac; Vandanjon, Pierre-Olivier; Belaroussi, Rachid; Hankach, Pierre

A comprehensive investigation of variational auto-encoders for population synthesis

Abdoul Razac Sané (), Pierre-Olivier Vandanjon (), Rachid Belaroussi () and Pierre Hankach ()
Additional contact information
Abdoul Razac Sané: University Gustave Eiffel
Pierre-Olivier Vandanjon: University Gustave Eiffel
Rachid Belaroussi: University Gustave Eiffel
Pierre Hankach: University Gustave Eiffel

Journal of Computational Social Science, 2025, vol. 8, issue 1, No 13, 34 pages

Abstract: Abstract The use of synthetic populations has grown considerably over the recent years, in revolutionizing studies conducted within various fields, including social science research, urban planning, public health and transportation modeling. These synthetic populations prove to be valuable, as substitutes for the often missing or sensitive real data, and moreover are capable of preserving both privacy and representativeness. They are typically constructed from aggregate and/or sample data. Recently, new methods for generating synthetic populations based on deep learning, notably Variational Autoencoders (VAEs), have been developed. Such methods serve to overcome the limitations of traditional methods, such as Iterative Proportional Fitting (IPF), which are unable to generate agents with cross-modalities not found in the sample data. As such, IPF requires large samples to generate a synthetic population closely resembling the actual one. Conversely, the advantage of VAE lies in their ability to generate agents not found in the sample data, albeit with the risk of creating agents not existing in the actual population. However, the practical documentation as well as detailed analyses of the architectures and results from implementation of these deep learning approaches, in particular VAE, are limited, thus making these methods difficult to appropriate for practitioners. This paper focuses on generating synthetic populations using VAE. First, an in-depth and accessible theoretical explanation of how VAEs function is provided. Next, a detailed study of these methods is carried out by testing the various architectures, parameters, sample sizes and evaluation indicators necessary to guarantee high-quality results. Highlighted herein is the ability of VAEs to generate large datasets with a small training sample, in addition to VAE performance in generating new realistic individuals not present in the learning base. Certain limitations are identified, including the difficulties encountered by VAEs in managing numerical attributes and the need for post-processing to eliminate unrealistic individuals. In conclusion, despite a number of limitations, VAE constitutes a very promising methodology for generating synthetic populations, in offering practitioners numerous advantages. This paper is accompanied by a Python notebook to assist interested readers implement this new methodology.

Keywords: Synthetic population; Machine learning; Deep generative model; Variational autoencoders; Sampling zeros; Structural zeros (search for similar items in EconPapers)
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s42001-024-00332-0 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:jcsosc:v:8:y:2025:i:1:d:10.1007_s42001-024-00332-0

Ordering information: This journal article can be ordered from
http://www.springer. ... iences/journal/42001

DOI: 10.1007/s42001-024-00332-0

Access Statistics for this article

Journal of Computational Social Science is currently edited by Takashi Kamihigashi

More articles in Journal of Computational Social Science from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().