Dynamic Pricing and Inventory Control with Fixed Ordering Cost and Incomplete Demand Information

Chen, Boxiao; Simchi-Levi, David; Wang, Yining; Zhou, Yuan

Dynamic Pricing and Inventory Control with Fixed Ordering Cost and Incomplete Demand Information

Boxiao Chen (), David Simchi-Levi (), Yining Wang () and Yuan Zhou ()
Additional contact information
Boxiao Chen: College of Business Administration, University of Illinois, Chicago, Illinois 60607
David Simchi-Levi: Institute for Data, Systems and Society, Operations Research Center, Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Yining Wang: Warrington College of Business, University of Florida, Gainesville, Florida 32611
Yuan Zhou: Department of Industrial & Enterprise Systems Engineering, University of Illinois, Urbana-Champaign, Illinois 61801; Yanqi Lake Beijing Institute of Mathematical Science and Applications, Beijing 101408, China

Management Science, 2022, vol. 68, issue 8, 5684-5703

Abstract: We consider the periodic review dynamic pricing and inventory control problem with fixed ordering cost. Demand is random and price dependent, and unsatisfied demand is backlogged. With complete demand information, the celebrated ( s , S , p ) policy is proved to be optimal, where s and S are the reorder point and order-up-to level for ordering strategy, and p , a function of on-hand inventory level, characterizes the pricing strategy. In this paper, we consider incomplete demand information and develop online learning algorithms whose average profit approaches that of the optimal ( s , S , p ) with a tight O ˜ ( T ) regret rate. A number of salient features differentiate our work from the existing online learning researches in the operations management (OM) literature. First, computing the optimal ( s , S , p ) policy requires solving a dynamic programming (DP) over multiple periods involving unknown quantities, which is different from the majority of learning problems in OM that only require solving single-period optimization questions. It is hence challenging to establish stability results through DP recursions, which we accomplish by proving uniform convergence of the profit-to-go function. The necessity of analyzing action-dependent state transition over multiple periods resembles the reinforcement learning question, considerably more difficult than existing bandit learning algorithms. Second, the pricing function p is of infinite dimension, and approaching it is much more challenging than approaching a finite number of parameters as seen in existing researches. The demand-price relationship is estimated based on upper confidence bound, but the confidence interval cannot be explicitly calculated due to the complexity of the DP recursion. Finally, because of the multiperiod nature of ( s , S , p ) policies the actual distribution of the randomness in demand plays an important role in determining the optimal pricing strategy p , which is unknown to the learner a priori. In this paper, the demand randomness is approximated by an empirical distribution constructed using dependent samples, and a novel Wasserstein metric-based argument is employed to prove convergence of the empirical distribution.

Keywords: dynamic pricing; inventory control; fixed ordering cost; online learning; asymptotic analysis (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (6)

Downloads: (external link)
http://dx.doi.org/10.1287/mnsc.2021.4171 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:ormnsc:v:68:y:2022:i:8:p:5684-5703

Access Statistics for this article

More articles in Management Science from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().