Partial Identification from LLM Prompts

Chen, Xiaohong; Tamer, Elie

Partial Identification from LLM Prompts

Xiaohong Chen and Elie Tamer

Abstract: Large language models are increasingly used as binary classifiers when the true label is latent. We study partial identification of the prevalence $\theta = P(X^* = 1)$ from panels of LLM reports whose errors may be arbitrarily dependent given the truth. The design of replication determines the observable, and hence the identifying content: repeated prompts to one model yield a count, several named models a response vector, and both a response matrix. Cast as a two-component finite mixture, the problem makes the identification failure transparent: absent restrictions that separate the latent components, the prevalence $\theta$ is completely unidentified, and weak stochastic-ordering restrictions (first-order dominance, monotone likelihood ratio, mean ordering) leave the identified set at $[0,1]$. Identifying power comes instead from externally calibrated scores and events, which discipline the mixture in the spirit of the misclassification and corrupted-data literature. We characterize the resulting bounds, establishing validity and sharpness, and give an exact account of the identifying information in the full score distribution beyond its mean. When named models are asked repeated versions of the same question, what identifies $\theta$ is not the number of positive answers but which models agree across prompts -- a feature a vote count discards. An extension derives implied bounds on regression coefficients when $X^*$ is a regressor of interest that is not directly observed.

Date: 2026-06
References: Add references at CitEc
Citations:

Downloads: (external link)
http://arxiv.org/pdf/2606.15031 Latest version (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2606.15031

Access Statistics for this paper

More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().