Theory of Choice in Bandit, Information Sampling and Foraging Tasks
Bruno B Averbeck
PLOS Computational Biology, 2015, vol. 11, issue 3, 1-28
Abstract:
Decision making has been studied with a wide array of tasks. Here we examine the theoretical structure of bandit, information sampling and foraging tasks. These tasks move beyond tasks where the choice in the current trial does not affect future expected rewards. We have modeled these tasks using Markov decision processes (MDPs). MDPs provide a general framework for modeling tasks in which decisions affect the information on which future choices will be made. Under the assumption that agents are maximizing expected rewards, MDPs provide normative solutions. We find that all three classes of tasks pose choices among actions which trade-off immediate and future expected rewards. The tasks drive these trade-offs in unique ways, however. For bandit and information sampling tasks, increasing uncertainty or the time horizon shifts value to actions that pay-off in the future. Correspondingly, decreasing uncertainty increases the relative value of actions that pay-off immediately. For foraging tasks the time-horizon plays the dominant role, as choices do not affect future uncertainty in these tasks.Author Summary: Numerous choice tasks have been used to study decision processes. Some of these choice tasks, specifically n-armed bandit, information sampling and foraging tasks, pose choices that trade-off immediate and future reward. Specifically, the best choice may not be the choice that pays off the highest reward immediately, and exploration of unknown options vs. exploiting known options can be a normatively useful strategy. We characterized the optimal choice strategies across these tasks using Markov Decision Processes (MDPs). The MDP framework can characterize optimal choice strategies when choices are affected by the value of future rewards. We found that uncertainty and time horizon have important effects on the choice strategies in these tasks. Specifically, in bandit and information sampling tasks, increasing uncertainty increases the value of exploring choice options that tend to pay off in the future, while decreasing uncertainty increases the value of choice options that pay off immediately. These effects are increased when time horizons are longer. Foraging tasks differ in that uncertainty plays a minimal role. However, time horizon is still important in foraging. Specifically, for long time horizons, travel delays to rewards become less relevant.
Date: 2015
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (7)
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004164 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 04164&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1004164
DOI: 10.1371/journal.pcbi.1004164
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().