Explaining and predicting human behavior and social dynamics in simulated virtual worlds: reproducibility, generalizability, and robustness of causal discovery methods

Volkova, Svitlana; Arendt, Dustin; Saldanha, Emily; Glenski, Maria; Ayton, Ellyn; Cottam, Joseph; Aksoy, Sinan; Jefferson, Brett; Shrivaram, Karthnik

Explaining and predicting human behavior and social dynamics in simulated virtual worlds: reproducibility, generalizability, and robustness of causal discovery methods

Svitlana Volkova (), Dustin Arendt, Emily Saldanha, Maria Glenski, Ellyn Ayton, Joseph Cottam, Sinan Aksoy, Brett Jefferson and Karthnik Shrivaram
Additional contact information
Svitlana Volkova: Pacific Northwest National Laboratory
Dustin Arendt: Pacific Northwest National Laboratory
Emily Saldanha: Pacific Northwest National Laboratory
Maria Glenski: Pacific Northwest National Laboratory
Ellyn Ayton: Pacific Northwest National Laboratory
Joseph Cottam: Pacific Northwest National Laboratory
Sinan Aksoy: Pacific Northwest National Laboratory
Brett Jefferson: Pacific Northwest National Laboratory
Karthnik Shrivaram: Pacific Northwest National Laboratory

Computational and Mathematical Organization Theory, 2023, vol. 29, issue 1, No 8, 220-241

Abstract: Abstract Ground Truth program was designed to evaluate social science modeling approaches using simulation test beds with ground truth intentionally and systematically embedded to understand and model complex Human Domain systems and their dynamics Lazer et al. (Science 369:1060–1062, 2020). Our multidisciplinary team of data scientists, statisticians, experts in Artificial Intelligence (AI) and visual analytics had a unique role on the program to investigate accuracy, reproducibility, generalizability, and robustness of the state-of-the-art (SOTA) causal structure learning approaches applied to fully observed and sampled simulated data across virtual worlds. In addition, we analyzed the feasibility of using machine learning models to predict future social behavior with and without causal knowledge explicitly embedded. In this paper, we first present our causal modeling approach to discover the causal structure of four virtual worlds produced by the simulation teams—Urban Life, Financial Governance, Disaster and Geopolitical Conflict. Our approach adapts the state-of-the-art causal discovery (including ensemble models), machine learning, data analytics, and visualization techniques to allow a human-machine team to reverse-engineer the true causal relations from sampled and fully observed data. We next present our reproducibility analysis of two research methods team’s performance using a range of causal discovery models applied to both sampled and fully observed data, and analyze their effectiveness and limitations. We further investigate the generalizability and robustness to sampling of the SOTA causal discovery approaches on additional simulated datasets with known ground truth. Our results reveal the limitations of existing causal modeling approaches when applied to large-scale, noisy, high-dimensional data with unobserved variables and unknown relationships between them. We show that the SOTA causal models explored in our experiments are not designed to take advantage from vasts amounts of data and have difficulty recovering ground truth when latent confounders are present; they do not generalize well across simulation scenarios and are not robust to sampling; they are vulnerable to data and modeling assumptions, and therefore, the results are hard to reproduce. Finally, when we outline lessons learned and provide recommendations to improve models for causal discovery and prediction of human social behavior from observational data, we highlight the importance of learning data to knowledge representations or transformations to improve causal discovery and describe the benefit of causal feature selection for predictive and prescriptive modeling.

Keywords: Causal discovery; Causal structure learning; Ensemble models; Reproducibility; Generalizability; Robustness; Predictive modeling; Machine learning; Data science (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s10588-021-09351-y Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:comaot:v:29:y:2023:i:1:d:10.1007_s10588-021-09351-y

Ordering information: This journal article can be ordered from
http://www.springer.com/journal/10588

DOI: 10.1007/s10588-021-09351-y

Access Statistics for this article

Computational and Mathematical Organization Theory is currently edited by Terrill Frantz and Kathleen Carley

More articles in Computational and Mathematical Organization Theory from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().