Design and calibration of a DRL algorithm for solving the job shop scheduling problem under unexpected job arrivals

Hammami, Nour El Houda; Lardeux, Benoit; Hadj-Alouane, Atidel B.; Jridi, Maher

Design and calibration of a DRL algorithm for solving the job shop scheduling problem under unexpected job arrivals

Nour El Houda Hammami (), Benoit Lardeux (), Atidel B. Hadj-Alouane () and Maher Jridi ()
Additional contact information
Nour El Houda Hammami: University of Tunis El Manar
Benoit Lardeux: ISEN Yncréa Ouest
Atidel B. Hadj-Alouane: University of Tunis El Manar
Maher Jridi: ISEN Yncréa Ouest

Flexible Services and Manufacturing Journal, 2025, vol. 37, issue 1, No 5, 125-156

Abstract: Abstract This paper proposes a Deep Reinforcement Learning(DRL)—based approach to solve the real-time Job Shop Scheduling Problem (JSSP) facing unexpected job arrivals. The approach combines the use of a DRL algorithm, Proximal Policy Optimization Actor and Critic (PPO-AC) algorithm, with an event-driven rescheduling strategy for solving a bi-objective decision problem. PPO-AC models an agent in interaction with its environment, aiming to achieve a predefined goal by maximizing the total cumulative reward. In this work, the total cumulative reward is designed as the opposite of the optimization objective function, which is expressed as the weighted sum of generated schedule completion time (efficiency criterion), and deviation from an initially generated schedule (stability criterion). The agent minimizes the objective function by maximizing the total cumulative reward. To the best of our knowledge, no prior work focused on scheduling stability while using DRL algorithms. Graph Neural Network (GNN) architecture is exploited to model environment states, enhancing the approach’s adaptability. Training experiments are conducted to calibrate the algorithm. A sensitivity analysis is conducted on the deviation weight parameter to evaluate its variation impact on the proposed model. Results indicate that the proposed model is robust to such variation. For a fixed deviation weight value, the algorithm is compared to CP-optimizer, IBM constraint programming method, and a Mixed Integer Program, to assess its performance. Results reveal that for arriving small-job batches, PPO-AC succeeds in solving the problem in real-time with low gaps to the optimal solution.

Keywords: Real-time JSSP; Unexpected job arrival; Proximal policy optimization; Actor critic; Graph neural network; Schedule stability (search for similar items in EconPapers)
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s10696-024-09540-2 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:flsman:v:37:y:2025:i:1:d:10.1007_s10696-024-09540-2

Ordering information: This journal article can be ordered from
http://www.springer.com/journal/10696

DOI: 10.1007/s10696-024-09540-2

Access Statistics for this article

Flexible Services and Manufacturing Journal is currently edited by Hans Günther

More articles in Flexible Services and Manufacturing Journal from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().