Optimal Sequential Exploration: Bandits, Clairvoyants, and Wildcats
David B. Brown () and
James E. Smith ()
Additional contact information
David B. Brown: Fuqua School of Business, Duke University, Durham, North Carolina 27708
James E. Smith: Fuqua School of Business, Duke University, Durham, North Carolina 27708
Operations Research, 2013, vol. 61, issue 3, 644-665
Abstract:
This paper was motivated by the problem of developing an optimal policy for exploring an oil and gas field in the North Sea. Where should we drill first? Where do we drill next? In this and many other problems, we face a trade-off between earning (e.g., drilling immediately at the sites with maximal expected values) and learning (e.g., drilling at sites that provide valuable information) that may lead to greater earnings in the future. These “sequential exploration problems” resemble a multiarmed bandit problem, but probabilistic dependence plays a key role: outcomes at drilled sites reveal information about neighboring targets. Good exploration policies will take advantage of this information as it is revealed. We develop heuristic policies for sequential exploration problems and complement these heuristics with upper bounds on the performance of an optimal policy. We begin by grouping the targets into clusters of manageable size. The heuristics are derived from a model that treats these clusters as independent. The upper bounds are given by assuming each cluster has perfect information about the results from all other clusters. The analysis relies heavily on results for bandit superprocesses, a generalization of the multiarmed bandit problem. We evaluate the heuristics and bounds using Monte Carlo simulation and, in the North Sea example, we find that the heuristic policies are nearly optimal.
Keywords: dynamic programming; multiarmed bandits; bandit superprocesses; information relaxations (search for similar items in EconPapers)
Date: 2013
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (13)
Downloads: (external link)
http://dx.doi.org/10.1287/opre.2013.1164 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:61:y:2013:i:3:p:644-665
Access Statistics for this article
More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().