How large is "large enough" ? Large-scale experimental investigation of the reliability of confidence measures
Clémentine Bouleau (),
Nicolas Jacquemet () and
Maël Lebreton ()
Additional contact information
Clémentine Bouleau: PSE - Paris School of Economics - UP1 - Université Paris 1 Panthéon-Sorbonne - ENS-PSL - École normale supérieure - Paris - PSL - Université Paris Sciences et Lettres - EHESS - École des hautes études en sciences sociales - ENPC - École nationale des ponts et chaussées - CNRS - Centre National de la Recherche Scientifique - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement, CES - Centre d'économie de la Sorbonne - UP1 - Université Paris 1 Panthéon-Sorbonne - CNRS - Centre National de la Recherche Scientifique
Maël Lebreton: PSE - Paris School of Economics - UP1 - Université Paris 1 Panthéon-Sorbonne - ENS-PSL - École normale supérieure - Paris - PSL - Université Paris Sciences et Lettres - EHESS - École des hautes études en sciences sociales - ENPC - École nationale des ponts et chaussées - CNRS - Centre National de la Recherche Scientifique - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement, PJSE - Paris Jourdan Sciences Economiques - UP1 - Université Paris 1 Panthéon-Sorbonne - ENS-PSL - École normale supérieure - Paris - PSL - Université Paris Sciences et Lettres - EHESS - École des hautes études en sciences sociales - ENPC - École nationale des ponts et chaussées - CNRS - Centre National de la Recherche Scientifique - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement, UNIGE - Université de Genève = University of Geneva
PSE Working Papers from HAL
Abstract:
Whether individuals feel confident about their own actions, choices, or statements being correct, and how these confidence levels differ between individuals are two key primitives for countless behavioral theories and phenomena. In cognitive tasks, individual confidence is typically measured as the average of reports about choice accuracy, but how reliable is the resulting characterization of within-and between-individual confidence remains surprisingly undocumented. Here, we perform a large-scale resampling exercise in the Confidence Database to investigate the reliability of individual confidence estimates, and of comparisons across individuals' confidence levels. Our results show that confidence estimates are more stable than their choice-accuracy counterpart, reaching a reliability plateau after roughly 50 trials, regardless of a number of task design characteristics. While constituting a reliability upper-bound for task-based confidence measures, and thereby leaving open the question of the reliability of the construct itself, these results characterize the robustness of past and future task designs.
Keywords: Confidence; Accuracy; Reliability; Design of experiments; Multiple trials (search for similar items in EconPapers)
Date: 2025-01
New Economics Papers: this item is included in nep-exp and nep-neu
Note: View the original document on HAL open archive server: https://shs.hal.science/halshs-04893009v1
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://shs.hal.science/halshs-04893009v1/document (application/pdf)
Related works:
Working Paper: How large is "large enough" ? Large-scale experimental investigation of the reliability of confidence measures (2025) 
Working Paper: How large is "large enough" ? Large-scale experimental investigation of the reliability of confidence measures (2025) 
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:hal:psewpa:halshs-04893009
Access Statistics for this paper
More papers in PSE Working Papers from HAL
Bibliographic data for series maintained by CCSD ().