Closing the Gap: A Learning Algorithm for Lost-Sales Inventory Systems with Lead Times

Zhang, Huanan; Chao, Xiuli; Shi, Cong

Closing the Gap: A Learning Algorithm for Lost-Sales Inventory Systems with Lead Times

Huanan Zhang (), Xiuli Chao () and Cong Shi ()
Additional contact information
Huanan Zhang: Harold and Inge Marcus Department of Industrial and Manufacturing Engineering, Pennsylvania State University, University Park, Pennsylvania 16802
Xiuli Chao: Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48105
Cong Shi: Harold and Inge Marcus Department of Industrial and Manufacturing Engineering, Pennsylvania State University, University Park, Pennsylvania 16802

Management Science, 2020, vol. 66, issue 5, 1962-1980

Abstract: We consider a periodic-review, single-product inventory system with lost sales and positive lead times under censored demand. In contrast to the classical inventory literature, we assume the firm does not know the demand distribution a priori and makes an adaptive inventory-ordering decision in each period based only on the past sales (censored demand) data. The standard performance measure is regret, which is the cost difference between a learning algorithm and the clairvoyant (full-information) benchmark. When the benchmark is chosen to be the (full-information) optimal base-stock policy, Huh et al. [Huh WT, Janakiraman G, Muckstadt JA, Rusmevichientong P (2009a) An adaptive algorithm for finding the optimal base-stock policy in lost sales inventory systems with censored demand. Math. Oper. Res. 34(2):397–416.] developed a nonparametric learning algorithm with a cubic-root convergence rate on regret. An important open question is whether there exists a nonparametric learning algorithm whose regret rate matches the theoretical lower bound of any learning algorithms. In this work, we provide an affirmative answer to this question. More precisely, we propose a new nonparametric algorithm termed the simulated cycle-update policy and establish a square-root convergence rate on regret, which is proven to be the lower bound of any learning algorithm. Our algorithm uses a random cycle-updating rule based on an auxiliary simulated system running in parallel and also involves two new concepts, namely the withheld on-hand inventory and the double-phase cycle gradient estimation . The techniques developed are effective for learning a stochastic system with complex system dynamics and lasting impact of decisions.

Keywords: inventory; lost sales; lead time; base-stock policy; censored demand; nonparametric; learning algorithms; regret analysis (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (15)

Downloads: (external link)
https://doi.org/10.1287/mnsc.2019.3288 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:ormnsc:v:66:y:2020:i:5:p:1962-1980

Access Statistics for this article

More articles in Management Science from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().