Friend or Foe: Delegating to an AI Whose Alignment is Unknown

Fudenberg, Drew; Liang, Annie

Friend or Foe: Delegating to an AI Whose Alignment is Unknown

Drew Fudenberg and Annie Liang

Abstract: AI systems have the potential to improve decision-making, but decision makers face the risk that the AI may be misaligned with their objectives. We study this problem in the context of a treatment decision, where a designer decides which patient attributes to reveal to an AI before receiving a prediction of the patient's need for treatment. Providing the AI with more information increases the benefits of an aligned AI but also amplifies the harm from a misaligned one. We characterize how the designer should select attributes to balance these competing forces, depending on their beliefs about the AI's reliability. We show that the designer should optimally disclose attributes that identify \emph{rare} segments of the population in which the need for treatment is high, and pool the remaining patients.

Date: 2025-09
References: Add references at CitEc
Citations:

Downloads: (external link)
http://arxiv.org/pdf/2509.14396 Latest version (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2509.14396

Access Statistics for this paper

More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().