EconPapers    
Economics at your fingertips  
 

Optimal Policies for Dynamic Pricing and Inventory Control with Nonparametric Censored Demands

Boxiao Chen (), Yining Wang () and Yuan Zhou ()
Additional contact information
Boxiao Chen: College of Business Administration, University of Illinois, Chicago, Illinois 60607
Yining Wang: Naveen Jindal School of Management, University of Texas at Dallas, Richardson, Texas 75080
Yuan Zhou: Yau Mathematical Sciences Center & Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China

Management Science, 2024, vol. 70, issue 5, 3362-3380

Abstract: We study the classic model of joint pricing and inventory control with lost sales over T consecutive review periods. The firm does not know the demand distribution a priori and needs to learn it from historical censored demand data. We develop nonparametric online learning algorithms that converge to the clairvoyant optimal policy at the fastest possible speed. The fundamental challenges rely on that neither zeroth-order nor first-order feedbacks are accessible to the firm and reward at any single price is not observable due to demand censoring. We propose a novel inversion method based on empirical measures to consistently estimate the difference of the instantaneous reward functions at two prices , directly tackling the fundamental challenge brought by censored demands. Based on this technical innovation, we design bisection and trisection search methods that attain an O ˜ ( T ) regret for the case with concave reward functions, and we design an active tournament elimination method that attains O ˜ ( T 3 / 5 ) regret when the reward functions are nonconcave. We complement the O ˜ ( T 3 / 5 ) regret upper bound with a matching Ω ˜ ( T 3 / 5 ) regret lower bound. The lower bound is established by a novel information-theoretical argument based on generalized squared Hellinger distance, which is significantly different from conventional arguments that are based on Kullback-Leibler divergence. Both the upper bound technique based on the “difference estimator” and the lower bound technique based on generalized Hellinger distance are new in the literature, and can be potentially applied to solve other inventory or censored demand type problems that involve learning.

Keywords: dynamic pricing; inventory replenishment; censored demand; lost sales; regret minimization; bandit learning; nonconcavity (search for similar items in EconPapers)
Date: 2024
References: Add references at CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://dx.doi.org/10.1287/mnsc.2023.4859 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:ormnsc:v:70:y:2024:i:5:p:3362-3380

Access Statistics for this article

More articles in Management Science from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().

 
Page updated 2025-03-22
Handle: RePEc:inm:ormnsc:v:70:y:2024:i:5:p:3362-3380