Utility-inspired Reward Transformations Improve Reinforcement Learning Training of Language Models

Maura-Rivero, Roberto-Rafael; Nagpal, Chirag; Patel, Roma; Visin, Francesco

Utility-inspired Reward Transformations Improve Reinforcement Learning Training of Language Models

Roberto-Rafael Maura-Rivero, Chirag Nagpal, Roma Patel and Francesco Visin

Abstract: Current methods that train large language models (LLMs) with reinforcement learning feedback, often resort to averaging outputs of multiple rewards functions during training. This overlooks crucial aspects of individual reward dimensions and inter-reward dependencies that can lead to sub-optimal outcomes in generations. In this work, we show how linear aggregation of rewards exhibits some vulnerabilities that can lead to undesired properties of generated text. We then propose a transformation of reward functions inspired by economic theory of utility functions (specifically Inada conditions), that enhances sensitivity to low reward values while diminishing sensitivity to already high values. We compare our approach to the existing baseline methods that linearly aggregate rewards and show how the Inada-inspired reward feedback is superior to traditional weighted averaging. We quantitatively and qualitatively analyse the difference in the methods, and see that models trained with Inada-transformations score as more helpful while being less harmful.

Date: 2025-01, Revised 2025-02
New Economics Papers: this item is included in nep-big, nep-cmp and nep-upt
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
http://arxiv.org/pdf/2501.06248 Latest version (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2501.06248

Access Statistics for this paper

More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().