Benchmarking Municipal AI Chatbot Performance: Mixed Methods Insights into Competence, Integrity, and Algorithmic Discrimination in Dutch Public Administration
Ralfs Rudzitis and
Kristina Sabrina Weißmüller
Additional contact information
Kristina Sabrina Weißmüller: Vrije Universiteit Amsterdam
No me7pf_v1, SocArXiv from Center for Open Science
Abstract:
This study introduces the Public-sector Chatbot Performance (PCP) framework, a novel and comprehensive approach to systematically assess AI chatbot performance in public administration. The framework evaluates both technical competence—factual accuracy, completeness, and source reliability—and normative integrity, including lawfulness, transparency, equality, and privacy. To demonstrate applicability of the PCP framework, we benchmark the full set of municipal chatbot systems currently deployed in Dutch local governments, alongside two leading proprietary large language models (LLMs): ChatGPT-4o and Gemini 2.5 Pro. Using a pragmatic mixed methods approach, we developed 26 prompts with systematic user-based variation to explore algorithmic bias, resulting in a dataset of n=326 user-chatbot interactions. Quantitative analysis revealed that ChatGPT-4o achieved a composite performance score of 95.7%, significantly outperforming all municipal systems. Municipal chatbots exhibited notable shortcomings in competence and integrity, with some failing to meet basic standards of lawful and equal service provision. Exploratory qualitative analysis further uncovered algorithmic opacity, discretionary advice in violation of Dutch good governance regulations, and discriminatory responses based on “ethnic” usernames. These insights challenge assumptions about neutrality in public sector AI and underscore the need for ethical benchmarks in chatbot evaluation. The PCP framework offers actionable guidance for policymakers, technologists, and scholars committed to responsible digital governance.
Date: 2025-09-16
References: Add references at CitEc
Citations:
Downloads: (external link)
https://osf.io/download/68c7dd8a588bd35d40617bdf/
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:osf:socarx:me7pf_v1
DOI: 10.31219/osf.io/me7pf_v1
Access Statistics for this paper
More papers in SocArXiv from Center for Open Science
Bibliographic data for series maintained by OSF ().