Deep Reinforcement Learning for Optimal Replenishment in Stochastic Assembly Systems

Abdellahi, Lativa Sid Ahmed; Zoubeir, Zeinebou; Mohamed, Yahya; Haouba, Ahmedou; Hmetty, Sidi

Deep Reinforcement Learning for Optimal Replenishment in Stochastic Assembly Systems

Lativa Sid Ahmed Abdellahi (), Zeinebou Zoubeir, Yahya Mohamed, Ahmedou Haouba and Sidi Hmetty
Additional contact information
Lativa Sid Ahmed Abdellahi: Department of Mathematics and Computer Science, Faculty of Science and Technology, University of Nouakchott, Nouakchott BP 5026, Mauritania
Zeinebou Zoubeir: Department of Mathematics and Industrial Engineering, Institute of Industrial Engineering, University of Nouakchott, Nouakchott BP 5026, Mauritania
Yahya Mohamed: Analysis and Modeling for Environment and Health (UMR-AMES), Department of Quantitative Techniques, Faculty of Economics and Management, University of Nouakchott, Nouakchott BP 5026, Mauritania
Ahmedou Haouba: Department of Mathematics and Computer Science, Faculty of Science and Technology, University of Nouakchott, Nouakchott BP 5026, Mauritania
Sidi Hmetty: Department of Mathematics and Computer Science, Faculty of Science and Technology, University of Nouakchott, Nouakchott BP 5026, Mauritania

Mathematics, 2025, vol. 13, issue 14, 1-29

Abstract: This study presents a reinforcement learning–based approach to optimize replenishment policies in the presence of uncertainty, with the objective of minimizing total costs, including inventory holding, shortage, and ordering costs. The focus is on single-level assembly systems, where both component delivery lead times and finished product demand are subject to randomness. The problem is formulated as a Markov decision process (MDP), in which an agent determines optimal order quantities for each component by accounting for stochastic lead times and demand variability. The Deep Q-Network (DQN) algorithm is adapted and employed to learn optimal replenishment policies over a fixed planning horizon. To enhance learning performance, we develop a tailored simulation environment that captures multi-component interactions, random lead times, and variable demand, along with a modular and realistic cost structure. The environment enables dynamic state transitions, lead time sampling, and flexible order reception modeling, providing a high-fidelity training ground for the agent. To further improve convergence and policy quality, we incorporate local search mechanisms and multiple action space discretizations per component. Simulation results show that the proposed method converges to stable ordering policies after approximately 100 episodes. The agent achieves an average service level of 96.93%, and stockout events are reduced by over 100% relative to early training phases. The system maintains component inventories within operationally feasible ranges, and cost components—holding, shortage, and ordering—are consistently minimized across 500 training episodes. These findings highlight the potential of deep reinforcement learning as a data-driven and adaptive approach to inventory management in complex and uncertain supply chains.

Keywords: assembly system; inventory management; replenishment planning; stochastic demand; uncertain lead times; deep reinforcement learning; deep q-network (DQN); data-driven inventory management (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/14/2229/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/14/2229/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:14:p:2229-:d:1698190

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().