Modeling the Violation of Reward Maximization and Invariance in Reinforcement Schedules

Camera, Giancarlo La; Richmond, Barry J

Modeling the Violation of Reward Maximization and Invariance in Reinforcement Schedules

Giancarlo La Camera and Barry J Richmond

PLOS Computational Biology, 2008, vol. 4, issue 8, 1-17

Abstract: It is often assumed that animals and people adjust their behavior to maximize reward acquisition. In visually cued reinforcement schedules, monkeys make errors in trials that are not immediately rewarded, despite having to repeat error trials. Here we show that error rates are typically smaller in trials equally distant from reward but belonging to longer schedules (referred to as “schedule length effect”). This violates the principles of reward maximization and invariance and cannot be predicted by the standard methods of Reinforcement Learning, such as the method of temporal differences. We develop a heuristic model that accounts for all of the properties of the behavior in the reinforcement schedule task but whose predictions are not different from those of the standard temporal difference model in choice tasks. In the modification of temporal difference learning introduced here, the effect of schedule length emerges spontaneously from the sensitivity to the immediately preceding trial. We also introduce a policy for general Markov Decision Processes, where the decision made at each node is conditioned on the motivation to perform an instrumental action, and show that the application of our model to the reinforcement schedule task and the choice task are special cases of this general theoretical framework. Within this framework, Reinforcement Learning can approach contextual learning with the mixture of empirical findings and principled assumptions that seem to coexist in the best descriptions of animal behavior. As examples, we discuss two phenomena observed in humans that often derive from the violation of the principle of invariance: “framing,” wherein equivalent options are treated differently depending on the context in which they are presented, and the “sunk cost” effect, the greater tendency to continue an endeavor once an investment in money, effort, or time has been made. The schedule length effect might be a manifestation of these phenomena in monkeys.Author Summary: Theories of rational behavior are built on a number of principles, including the assumption that subjects adjust their behavior to maximize their long-term returns and that they should work equally hard to obtain a reward in situations where the effort to obtain reward is the same (called the invariance principle). Humans, however, are sensitive to the manner in which equivalent choices are presented, or “framed,” and often have a greater tendency to continue an endeavor once an investment in money, effort, or time has been made, a phenomenon known as “sunk cost” effect. In a similar manner, when monkeys must perform different numbers of trials to obtain a reward, they work harder as the number of trials already performed increases, even though both the work remaining and the forthcoming reward are the same in all situations. Methods from the theory of Reinforcement Learning, which usually provide learning strategies aimed at maximizing returns, cannot model this violation of invariance. Here we generalize a prominent method of Reinforcement Learning so as to explain the violation of invariance, without losing the ability to model behaviors explained by standard Reinforcement Learning models. This generalization extends our understanding of how animals and humans learn and behave.

Date: 2008
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000131 (text/html)
https://journals.plos.org/ploscompbiol/article?id= ... 00131&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1000131

DOI: 10.1371/journal.pcbi.1000131

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().