Inventory Balancing with Online Learning

Cheung, Wang Chi; Ma, Will; Simchi-Levi, David; Wang, Xinshang

Inventory Balancing with Online Learning

Wang Chi Cheung (), Will Ma (), David Simchi-Levi () and Xinshang Wang ()
Additional contact information
Wang Chi Cheung: Department of Industrial Systems Engineering and Management, National University of Singapore, Singapore, Singapore 117576
Will Ma: Graduate School of Business, Columbia University, New York, New York 10027
David Simchi-Levi: Institute for Data, Systems, and Society, Department of Civil and Environmental Engineering, and Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Xinshang Wang: Alibaba Group US, San Mateo, California 94402; Antai College of Economics and Management, Shanghai Jiao Tong University, Shanghai 200240, China

Management Science, 2022, vol. 68, issue 3, 1776-1807

Abstract: We study a general problem of allocating limited resources to heterogeneous customers over time under model uncertainty. Each type of customer can be serviced using different actions, each of which stochastically consumes some combination of resources and returns different rewards for the resources consumed. We consider a general model in which the resource consumption distribution associated with each customer type–action combination is not known but is consistent and can be learned over time. In addition, the sequence of customer types to arrive over time is arbitrary and completely unknown. We overcome both the challenges of model uncertainty and customer heterogeneity by judiciously synthesizing two algorithmic frameworks from the literature: inventory balancing, which “reserves” a portion of each resource for high-reward customer types that could later arrive based on competitive ratio analysis, and online learning, which “explores” the resource consumption distributions for each customer type under different actions based on regret analysis. We define an auxiliary problem, which allows for existing competitive ratio and regret bounds to be seamlessly integrated. Furthermore, we propose a new variant of upper confidence bound (UCB), dubbed lazyUCB, which conducts less exploration in a bid to focus on “exploitation” in view of the resource scarcity. Finally, we construct an information-theoretic family of counterexamples to show that our integrated framework achieves the best possible performance guarantee. We demonstrate the efficacy of our algorithms on both synthetic instances generated for the online matching with stochastic rewards problem under unknown probabilities and a publicly available hotel data set. Our framework is highly practical in that it requires no historical data (no fitted customer choice models or forecasting of customer arrival patterns) and can be used to initialize allocation strategies in fast-changing environments.

Keywords: decision analysis; sequential; analysis of algorithms; suboptimal algorithms; inventory production (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://dx.doi.org/10.1287/mnsc.2021.4216 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:ormnsc:v:68:y:2022:i:3:p:1776-1807

Access Statistics for this article

More articles in Management Science from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().