(Human) Attention Is (Still) All You Need: Human oversight makes AI-assisted social science reliable

Zhu, Chen; Wang, Xiaolu; Zhang, Weilong

(Human) Attention Is (Still) All You Need: Human oversight makes AI-assisted social science reliable

Chen Zhu, Xiaolu Wang and Weilong Zhang

Abstract: Large language models (LLMs) are increasingly used for tasks once reserved for trained researchers, including hypothesis generation, specification choice, and drafting conclusions. We argue that the reliability of AI-assisted research depends not only on model capability, but also on how cognitive labour is structured between humans and machines. We study this problem through Human-in-the-Loop Economic Research (HLER), a decision architecture based on pre-commitment, decision sequencing, accountability, and attention allocation. In a pre-specified 2*4 factorial experiment with 280 complete research runs across four datasets, an unconstrained multi-agent baseline produced critical failures in 72% of runs. Using the same underlying model, the same agent decomposition, and identical prompts for the shared reasoning agents, HLER reduced the failure rate to 16% by imposing three architectural commitments: LLMs reason but do not execute data work, data and estimation are handled deterministically, and three human decision gates bind the workflow. Fisher's exact test rejects equality of failure rates at p

Date: 2026-06
New Economics Papers: this item is included in nep-ain and nep-cmp
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://arxiv.org/pdf/2606.12848 Latest version (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2606.12848

Access Statistics for this paper

More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().