Steering Prosocial AI Agents: Computational Basis of LLM's Decision Making in Social Simulation

Ma, Ji

Steering Prosocial AI Agents: Computational Basis of LLM's Decision Making in Social Simulation

Ji Ma
Additional contact information
Ji Ma: The University of Texas at Austin

No 8p7wg_v1, OSF Preprints from Center for Open Science

Abstract: Large language models (LLMs) increasingly serve as human-like decision-making agents in social science and applied settings. These LLM-agents are typically assigned human-like characters and placed in real-life contexts. However, how these characters and contexts shape an LLM's behavior remains underexplored. This study proposes and tests methods for probing, quantifying, and modifying an LLM's internal representations in a Dictator Game -- a classic behavioral experiment on fairness and prosocial behavior. We extract ``vectors of variable variations'' (e.g., ``male'' to ``female'') from the LLM's internal state. Manipulating these vectors during the model's inference can substantially alter how those variables relate to the model's decision-making. This approach offers a principled way to study and regulate how social concepts can be encoded and engineered within transformer-based models, with implications for alignment, debiasing, and designing AI agents for social simulations in both academic and commercial applications.

Date: 2025-04-18
New Economics Papers: this item is included in nep-ain, nep-big, nep-cmp and nep-exp
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://osf.io/download/680059f62b7c372b72764110/

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:osf:osfxxx:8p7wg_v1

DOI: 10.31219/osf.io/8p7wg_v1

Access Statistics for this paper

More papers in OSF Preprints from Center for Open Science
Bibliographic data for series maintained by OSF ().