Deep Reinforcement Learning for One-Warehouse Multi-Retailer inventory management

Kaynov, Illya; van Knippenberg, Marijn; Menkovski, Vlado; van Breemen, Albert; van Jaarsveld, Willem

Deep Reinforcement Learning for One-Warehouse Multi-Retailer inventory management

Illya Kaynov, Marijn van Knippenberg, Vlado Menkovski, Albert van Breemen and Willem van Jaarsveld

International Journal of Production Economics, 2024, vol. 267, issue C

Abstract: The One-Warehouse Multi-Retailer (OWMR) system is the prototypical distribution and inventory system. Many OWMR variants exist, e.g. demand in excess of supply may be completely back-ordered, partially back-ordered, or lost. Prior research has focused on the study of heuristic reordering policies such as echelon base-stock levels coupled with heuristic allocation policies. Constructing well-performing policies is time-consuming and must be redone for every problem variant. By contrast, Deep Reinforcement Learning (DRL) is a general purpose technique for sequential decision making that has yielded good results for various challenging inventory systems. However, applying DRL to OWMR problems is nontrivial, since allocation involves setting a quantity for each retailer: The number of possible allocations grows exponentially in the number of retailers. Since each action is typically associated with a neural network output node, this renders standard DRL techniques intractable. Our proposed DRL algorithm instead inferences a multi-discrete action distribution which has output nodes that grow linearly in the number of retailers. Moreover, when total retailer orders exceed the available warehouse inventory, we propose a random rationing policy that substantially improves the ability of standard DRL algorithms to train good policies because it promotes the learning of feasible retailer order quantities. The resulting algorithm outperforms general-purpose benchmark policies by ∼1−3% for the lost sales case and by ∼12−20% for the partial back-ordering case. For complete back-ordering, the algorithm cannot consistently outperform the benchmark.

Keywords: Multi-echelon inventory control; Deep Reinforcement Learning; Allocation policies (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0925527323003201
Full text for ScienceDirect subscribers only

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:proeco:v:267:y:2024:i:c:s0925527323003201

DOI: 10.1016/j.ijpe.2023.109088

Access Statistics for this article

International Journal of Production Economics is currently edited by Stefan Minner

More articles in International Journal of Production Economics from Elsevier
Bibliographic data for series maintained by Catherine Liu ().