Dynamic Learning and Decision Making via Basis Weight Vectors
Hao Zhang ()
Additional contact information
Hao Zhang: Sauder School of Business, University of British Columbia, Vancouver, British Columbia V6T 1Z2, Canada
Operations Research, 2022, vol. 70, issue 3, 1835-1853
Abstract:
This paper presents a new methodology to solve a general model of dynamic decision making with a continuous unknown parameter or state. The methodology centers on the “continuation-value functions” (mappings from the parameter space to the continuation-value space), created by feasible continuation policies. When the model primitives can be described through a family of basis functions (e.g., polynomials), a continuation-value function retains that property and can be represented by a basis weight vector. The set of efficient basis weight vectors can be constructed through backward induction, which leads to a significant reduction of problem complexity and enables an exact solution for small-sized problems. A set of approximation methods based on the new methodology is developed to tackle larger problems. The methodology is also extended to the multidimensional (multiparameter) setting, which features the problem of contextual multiarmed bandits with linear expected rewards. The approximation algorithm developed in this paper outperforms three benchmark algorithms (epsilon-greedy, Thompson sampling, and LinUCB) in learning situations with many actions and short horizons.
Keywords: Stochastic Models; learning and doing; dynamic pricing with learning; linear contextual bandits; approximate dynamic programming; basis representation of functions (search for similar items in EconPapers)
Date: 2022
References: Add references at CitEc
Citations:
Downloads: (external link)
http://dx.doi.org/10.1287/opre.2021.2240 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:70:y:2022:i:3:p:1835-1853
Access Statistics for this article
More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().