Deep Q-Networks for Minimizing Total Tardiness on a Single Machine

Huang, Kuan Wei; Lin, Bertrand M. T.

Deep Q-Networks for Minimizing Total Tardiness on a Single Machine

Kuan Wei Huang and Bertrand M. T. Lin ()
Additional contact information
Kuan Wei Huang: Institute of Information Management, Institute of Hospital and Health Care Administration, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
Bertrand M. T. Lin: Institute of Information Management, Institute of Hospital and Health Care Administration, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan

Mathematics, 2024, vol. 13, issue 1, 1-22

Abstract: This paper considers the single-machine scheduling problem of total tardiness minimization. Due to its computational intractability, exact approaches such as dynamic programming algorithms and branch-and-bound algorithms struggle to produce optimal solutions for large-scale instances in a reasonable time. The advent of Deep Q-Networks (DQNs) within the reinforcement learning paradigm could be a viable approach to transcending these limitations, offering a robust and adaptive approach. This study introduces a novel approach utilizing DQNs to model the complexities of job scheduling for minimizing tardiness through an informed selection utilizing look-ahead mechanisms of actions within a defined state space. The framework incorporates seven distinct reward-shaping strategies, among which the Minimum Estimated Future Tardiness strategy notably enhances the DQN model’s performance. Specifically, it achieves an average improvement of 14.33% over Earliest Due Date (EDD), 11.90% over Shortest Processing Time (SPT), 17.65% over Least Slack First (LSF), and 8.86% over Apparent Tardiness Cost (ATC). Conversely, the Number of Delayed Jobs strategy secures an average improvement of 11.56% over EDD, 9.10% over SPT, 15.01% over LSF, and 5.99% over ATC, all while requiring minimal computational resources. The results of a computational study demonstrate DQN’s impressive performance compared to traditional heuristics. This underscores the capacity of advanced machine learning techniques to improve industrial scheduling processes, potentially leading to decent operational efficiency.

Keywords: single-machine scheduling; total tardiness; machine learning; deep Q-networks; heuristic rules; short-term rewards; long-term rewards (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/1/62/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/1/62/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2024:i:1:p:62-:d:1554850

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().