Multimodal Dynamic Pricing

Wang, Yining; Chen, Boxiao; Simchi-Levi, David

Multimodal Dynamic Pricing

Yining Wang (), Boxiao Chen () and David Simchi-Levi ()
Additional contact information
Yining Wang: Department of Information Systems and Operations Management, Warrington College of Business, University of Florida, Gainesville, Florida 32611
Boxiao Chen: Department of Information and Decision Sciences, College of Business Administration, University of Illinois, Chicago, Illinois 60607
David Simchi-Levi: Institute for Data, Systems, and Society, Department of Civil and Environmental Engineering, and Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139

Management Science, 2021, vol. 67, issue 10, 6136-6152

Abstract: We consider a single product dynamic pricing with demand learning. The candidate prices belong to a wide range of a price interval; the modeling of the demand functions is nonparametric in nature, imposing only smoothness regularity conditions. One important aspect of our model is the possibility of the expected reward function to be nonconcave and indeed multimodal, which leads to many conceptual and technical challenges. Our proposed algorithm is inspired by both the Upper-Confidence-Bound algorithm for multiarmed bandit and the Optimism-in-the-Face-of-Uncertainty principle arising from linear contextual bandits. The multiarmed bandit formulation arises from local-bin approximation of an unknown continuous demand function, and the linear contextual bandit formulation is then applied to obtain more accurate local polynomial approximators within each bin. Through rigorous regret analysis, we demonstrate that our proposed algorithm achieves optimal worst-case regret over a wide range of smooth function classes. More specifically, for k-times smooth functions and T selling periods, the regret of our proposed algorithm is O ˜ ( T ( K + 1 ) / ( 2 K + 1 ) ) , which is shown to be optimal via the development of information theoretical lower bounds. We also show that in special cases, such as strongly concave or infinitely smooth reward functions, our algorithm achieves an O ( T ) regret, matching optimal regret established in previous works. Finally, we present computational results that verify the effectiveness of our method in numerical simulations.

Keywords: multimodal reward function; dynamic pricing; nonparametric learning; asymptotic analyses (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (7)

Downloads: (external link)
http://dx.doi.org/10.1287/mnsc.2020.3819 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:ormnsc:v:67:y:2021:i:10:p:6136-6152

Access Statistics for this article

More articles in Management Science from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().