Inverse Reinforcement Learning with Just Classification and a Few Regressions

van der Laan, Lars; Kallus, Nathan; Bibaut, Aurelien

Inverse Reinforcement Learning with Just Classification and a Few Regressions

Lars van der Laan, Nathan Kallus and Aurelien Bibaut

Abstract: Inverse reinforcement learning (IRL) aims to infer rewards from observed behavior, but rewards are not identified from the policy alone: many reward--value pairs can rationalize the same actions. Meaningful reward recovery therefore requires a normalization, yet existing normalized IRL methods often rely on anchor-action restrictions or specialized neural architectures. We study reward recovery in the maximum-entropy, or Gumbel-shock, model under a broad class of statewise affine normalizations, with anchor-action constraints as a special case. This yields Generalized Policy-to-$Q$-to-Reward (GenPQR), a modular procedure that estimates the behavior policy, evaluates its soft $Q$-function through the Bellman equation, and recovers the normalized reward. Both stages can be implemented with off-the-shelf classification and regression methods. We prove modular finite-sample guarantees under general function approximation, with separate policy-estimation and $Q$-estimation errors. As a concrete instantiation, we study GenPQR with fitted $Q$-evaluation, reducing IRL to policy estimation followed by regression. Experiments show that GenPQR matches or improves reward recovery relative to DeepPQR while remaining simpler and more modular. Compared with DeepPQR, our theory goes beyond anchor actions, accommodates large and continuous action spaces, makes coverage requirements explicit, and is not tied to a specific neural-network architecture or training procedure.

Date: 2025-09, Revised 2026-05
New Economics Papers: this item is included in nep-cmp and nep-inv
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://arxiv.org/pdf/2509.21172 Latest version (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2509.21172

Access Statistics for this paper

More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().