The Moral Mind(s) of Large Language Models Avner Seror
Avner Seror
Working Papers from HAL
Abstract:
As large language models (LLMs) become integrated to decision-making across various sectors, a key question arises: do they exhibit an emergent "moral mind" -a consistent set of moral principles guiding their ethical judgments -and is this reasoning uniform or diverse across models? To investigate this, we presented about forty different models from the main providers with a large array of structured ethical scenarios, creating one of the largest datasets of its kind. Our rationality tests revealed that at least one model from each provider demonstrated behavior consistent with stable moral principles, effectively acting as approximately optimizing a utility function encoding ethical reasoning. We identified these utility functions and observed a notable clustering of models around neutral ethical stances. To investigate variability, we introduced a novel non-parametric permutation approach, revealing that the most rational models shared 59% to 76% of their ethical reasoning patterns. Despite this shared foundation, differences emerged: roughly half displayed greater moral adaptability, bridging diverse perspectives, while the remainder adhered to more rigid ethical structures.
Keywords: Decision Theory; Revealed Preference; Rationality; Artificial Intelligence; LLM; PSM (search for similar items in EconPapers)
Date: 2024-11-19
Note: View the original document on HAL open archive server: https://hal.science/hal-04798963v1
References: Add references at CitEc
Citations:
Downloads: (external link)
https://hal.science/hal-04798963v1/document (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:hal:wpaper:hal-04798963
Access Statistics for this paper
More papers in Working Papers from HAL
Bibliographic data for series maintained by CCSD ().