Going faster to see further: graphics processing unit-accelerated value iteration and simulation for perishable inventory control using JAX

Farrington, Joseph; Wong, Wai Keong; Li, Kezhi; Utley, Martin

Going faster to see further: graphics processing unit-accelerated value iteration and simulation for perishable inventory control using JAX

Joseph Farrington (), Wai Keong Wong, Kezhi Li () and Martin Utley
Additional contact information
Joseph Farrington: University College London
Wai Keong Wong: University College London
Kezhi Li: University College London
Martin Utley: University College London

Annals of Operations Research, 2025, vol. 349, issue 3, No 5, 1609-1638

Abstract: Abstract Value iteration can find the optimal replenishment policy for a perishable inventory problem, but is computationally demanding due to the large state spaces that are required to represent the age profile of stock. The parallel processing capabilities of modern graphics processing units (GPUs) can reduce the wall time required to run value iteration by updating many states simultaneously. The adoption of GPU-accelerated approaches has been limited in operational research relative to other fields like machine learning, in which new software frameworks have made GPU programming widely accessible. We used the Python library JAX to implement value iteration and simulators of the underlying Markov decision processes in a high-level interface, and relied on this library’s function transformations and compiler to efficiently utilize GPU hardware. Our method can extend use of value iteration to settings that were previously considered infeasible or impractical. We demonstrate this on example scenarios from three recent studies which include problems with over 16 million states and additional problem features, such as substitution between products, that increase computational complexity. We compare the performance of the optimal replenishment policies to heuristic policies, fitted using simulation optimization in JAX which allowed the parallel evaluation of multiple candidate policy parameters on thousands of simulated years. The heuristic policies gave a maximum optimality gap of 2.49%. Our general approach may be applicable to a wide range of problems in operational research that would benefit from large-scale parallel computation on consumer-grade GPU hardware.

Keywords: Inventory; Markov decision processes; Dynamic programming; Simulation; Reinforcement learning (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s10479-025-06551-6 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:annopr:v:349:y:2025:i:3:d:10.1007_s10479-025-06551-6

Ordering information: This journal article can be ordered from
http://www.springer.com/journal/10479

DOI: 10.1007/s10479-025-06551-6

Access Statistics for this article

Annals of Operations Research is currently edited by Endre Boros

More articles in Annals of Operations Research from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().