Reinforcement learning versus data-driven dynamic programming: a comparison for finite horizon dynamic pricing markets

Lange, Fabian; Dreessen, Leonard; Schlosser, Rainer

Reinforcement learning versus data-driven dynamic programming: a comparison for finite horizon dynamic pricing markets

Fabian Lange (), Leonard Dreessen () and Rainer Schlosser ()
Additional contact information
Fabian Lange: University of Potsdam
Leonard Dreessen: University of Potsdam
Rainer Schlosser: University of Potsdam

Journal of Revenue and Pricing Management, 2025, vol. 24, issue 6, No 7, 584-600

Abstract: Abstract Revenue management (RM) plays a vital role to optimize sales processes in real-life applications under incomplete information. The prediction of consumer demand and the anticipation of price reactions of competitors became key factors in RM to be able to apply classical dynamic programming (DP) methods for expected long-term reward maximization. Modern model-free deep Reinforcement Learning (RL) approaches are able to derive optimized policies without explicit estimations of underlying model dynamics. However, RL algorithms typically require either vast amounts of training data or a suitable synthetic model to be trained on. As existing studies focus on one group of algorithms only, the relation between established DP approaches and new RL techniques is opaque. To address this issue, in this paper, we use a dynamic pricing framework for an airline ticket market to compare state-of-the-art RL algorithms and data-driven versions of classic DP methods regarding (i) performance and (ii) required data to each other. For the DP techniques, we use estimations of market dynamics to be able to compare their performance and data consumption against RL methods. The numerical results of our experiments, which include monopoly as well as duopoly markets, allow to study how the different approaches’ performances relate to each other in exemplary settings. In both setups, we find that with few data (about 10 episodes) fitted DP methods were highly competitive; with medium amounts of data (about 100 episodes) DP methods got outperformed by RL, where PPO provided the best results. Given large amounts of training data (about 1000 episodes), the best RL algorithms, i.e., TD3, DDPG, PPO, and SAC, performed similarly achieving about 90% and more of the optimal solution.

Keywords: Dynamic pricing; Decision support; Method comparison; Approximate dynamic programming; Reinforcement learning (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1057/s41272-025-00519-8 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:pal:jorapm:v:24:y:2025:i:6:d:10.1057_s41272-025-00519-8

Ordering information: This journal article can be ordered from
https://www.palgrave.com/gp/journal/41272

DOI: 10.1057/s41272-025-00519-8

Access Statistics for this article

Journal of Revenue and Pricing Management is currently edited by Ian Yeoman

More articles in Journal of Revenue and Pricing Management from Palgrave Macmillan
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().