Improving network dynamic pricing policies through offline reinforcement learning
Philipp Hausenblas (),
Dominik Eichhorn (),
Andreas Brieden (),
Matthias Soppert () and
Claudius Steinhardt ()
Additional contact information
Philipp Hausenblas: University of the Bundeswehr Munich, Chair of Data Analytics & Statistics
Dominik Eichhorn: University of the Bundeswehr Munich, Chair of Data Analytics & Statistics
Andreas Brieden: University of the Bundeswehr Munich, Chair of Data Analytics & Statistics
Matthias Soppert: University of the Bundeswehr Munich, Chair of Business Analytics & Management Science
Claudius Steinhardt: University of the Bundeswehr Munich, Chair of Business Analytics & Management Science
OR Spectrum: Quantitative Approaches in Management, 2025, vol. 47, issue 4, No 5, 1217-1266
Abstract:
Abstract Due to exponentially growing state and action spaces, network dynamic pricing problems are analytically intractable such that state-of-the-art approaches rely on heuristics. Reinforcement learning has successfully been applied in various complex domains, but its successful applicability to pricing may be limited by two factors. First, the need for extensive state and action space exploration causes lost revenues when directly training within the real world. Secondly, alternatively replicating the real world in an accurate simulation to perform the training therein comes with limitations as well, because calibrating the simulation would require precise domain knowledge, which in general does not exist. To overcome the above issues, with this work, we propose a new dynamic pricing approach based on offline reinforcement learning. In contrast to online reinforcement learning, training solely requires a static data set containing information on historic sales, which stems from applying some arbitrary behavior policy in the past. In particular, we develop a low-dimensional state and actions space reformulation of the considered generic dynamic pricing problem which allows to incorporate the critic-regularized regression algorithm within a scalable approach. We also adapt the standard algorithm’s actor loss function, such that it can deal with the pricing problem’s state-dependent action space. Our studies show that the trained policy dominates and in some cases substantially outperforms the respective behavior policy. Hence, although there are some limitations that have to be discussed, offline reinforcement learning seems to be a promising approach for dynamic pricing in case online reinforcement learning is not an option.
Keywords: Network revenue management; Dynamic pricing; Offline reinforcement learning; Critic-regularized regression (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s00291-025-00821-2 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:orspec:v:47:y:2025:i:4:d:10.1007_s00291-025-00821-2
Ordering information: This journal article can be ordered from
http://www.springer. ... research/journal/291
DOI: 10.1007/s00291-025-00821-2
Access Statistics for this article
OR Spectrum: Quantitative Approaches in Management is currently edited by Rainer Kolisch
More articles in OR Spectrum: Quantitative Approaches in Management from Springer, Gesellschaft für Operations Research e.V.
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().