EconPapers    
Economics at your fingertips  
 

Benchmarking Municipal AI Chatbot Performance: Mixed Methods Insights into Competence, Integrity, and Algorithmic Discrimination in Dutch Public Administration

Ralfs Rudzitis and Kristina Sabrina Weißmüller
Additional contact information
Kristina Sabrina Weißmüller: Vrije Universiteit Amsterdam

No me7pf_v1, SocArXiv from Center for Open Science

Abstract: This study introduces the Public-sector Chatbot Performance (PCP) framework, a novel and comprehensive approach to systematically assess AI chatbot performance in public administration. The framework evaluates both technical competence—factual accuracy, completeness, and source reliability—and normative integrity, including lawfulness, transparency, equality, and privacy. To demonstrate applicability of the PCP framework, we benchmark the full set of municipal chatbot systems currently deployed in Dutch local governments, alongside two leading proprietary large language models (LLMs): ChatGPT-4o and Gemini 2.5 Pro. Using a pragmatic mixed methods approach, we developed 26 prompts with systematic user-based variation to explore algorithmic bias, resulting in a dataset of n=326 user-chatbot interactions. Quantitative analysis revealed that ChatGPT-4o achieved a composite performance score of 95.7%, significantly outperforming all municipal systems. Municipal chatbots exhibited notable shortcomings in competence and integrity, with some failing to meet basic standards of lawful and equal service provision. Exploratory qualitative analysis further uncovered algorithmic opacity, discretionary advice in violation of Dutch good governance regulations, and discriminatory responses based on “ethnic” usernames. These insights challenge assumptions about neutrality in public sector AI and underscore the need for ethical benchmarks in chatbot evaluation. The PCP framework offers actionable guidance for policymakers, technologists, and scholars committed to responsible digital governance.

Date: 2025-09-16
References: Add references at CitEc
Citations:

Downloads: (external link)
https://osf.io/download/68c7dd8a588bd35d40617bdf/

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:osf:socarx:me7pf_v1

DOI: 10.31219/osf.io/me7pf_v1

Access Statistics for this paper

More papers in SocArXiv from Center for Open Science
Bibliographic data for series maintained by OSF ().

 
Page updated 2025-09-20
Handle: RePEc:osf:socarx:me7pf_v1