Semiparametric Preference Optimization: Your Language Model is Secretly a Single-Index Model

Kallus, Nathan

Semiparametric Preference Optimization: Your Language Model is Secretly a Single-Index Model

Nathan Kallus

Abstract: Aligning large language models (LLMs) to preference data typically assumes a known link function between observed preferences and latent rewards (e.g., a logistic Bradley-Terry link). Misspecification of this link can bias inferred rewards and misalign learned policies. We study preference alignment under an unknown and unrestricted link function. We show that realizability of $f$-divergence-constrained reward maximization in a policy class induces a semiparametric single-index binary choice model, where a scalar policy-dependent index captures all dependence on demonstrations and the remaining preference distribution is unrestricted. Rather than assuming this model has identifiable finite-dimensional structural parameters and estimating them, as in econometrics, we focus on policy learning with the reward function implicit, analyzing error to the optimal policy and allowing for unidentifiable nonparametric indices. We develop preference optimization algorithms robust to the unknown link and prove convergence guarantees in terms of generic function complexity measures. We demonstrate this empirically on LLM alignment. Code is available at https://github.com/causalml/spo/

Date: 2025-12, Revised 2026-02
New Economics Papers: this item is included in nep-cmp, nep-dcm and nep-ecm
References: Add references at CitEc
Citations:

Downloads: (external link)
http://arxiv.org/pdf/2512.21917 Latest version (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2512.21917

Access Statistics for this paper

More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().