Reinforcement learning-guided adaptive large neighborhood search for vehicle routing problem with time windows
Zhaohui Wang,
Qiao Cui (),
Bin Tan,
Xiao Yang,
Weibang Zhou and
Xiangsheng Huang ()
Additional contact information
Zhaohui Wang: China Satellite Network Digital Technology Co., Ltd.
Qiao Cui: China Satellite Network Digital Technology Co., Ltd.
Bin Tan: China Satellite Network Digital Technology Co., Ltd.
Xiao Yang: University of Chinese Academy of Sciences
Weibang Zhou: University of Chinese Academy of Sciences
Xiangsheng Huang: Xiong’an Institute of Innovation
Journal of Combinatorial Optimization, 2025, vol. 50, issue 4, No 4, 24 pages
Abstract:
Abstract This paper presents PPO-ALNS, a novel hybrid framework that integrates Proximal Policy Optimization (PPO) with Adaptive Large Neighborhood Search (ALNS) for solving the Vehicle Routing Problem with Time Windows (VRPTW). Traditional heuristic methods struggle with balancing exploration and exploitation, while pure reinforcement learning approaches face challenges with discrete decision spaces and complex constraints. Our approach addresses these limitations by formulating ALNS as a Markov Decision Process, where a PPO agent dynamically guides destroy-repair operator selection, solution acceptance criteria, and search termination decisions. We design a comprehensive state representation incorporating problem-specific features and search dynamics, coupled with a four-dimensional action space for effective search coordination. The compound reward mechanism combines immediate rewards with terminal episode rewards, introducing dynamic penalties and adaptive early-stopping to optimize solution quality and computational efficiency. Experiments on VRPTW instances with 20, 50, and 100 customers demonstrate that PPO-ALNS consistently outperforms traditional ALNS, Ant Colony Optimization (ACO), and other Reinforcement Learning (RL)-based operator selection methods, achieving improvements of 11.37–17.41% over baseline approaches. The performance advantage increases with problem complexity, indicating strong scalability. Ablation studies validate the synergistic effect of simultaneous operator learning, while sensitivity analysis confirms optimal performance in medium time window scenarios, highlighting the framework’s effectiveness for complex routing optimization.
Keywords: Reinforcement learning; Adaptive large neighborhood search; Vehicle routing problem with time windows; Proximal policy optimization (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s10878-025-01364-6 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:jcomop:v:50:y:2025:i:4:d:10.1007_s10878-025-01364-6
Ordering information: This journal article can be ordered from
https://www.springer.com/journal/10878
DOI: 10.1007/s10878-025-01364-6
Access Statistics for this article
Journal of Combinatorial Optimization is currently edited by Thai, My T.
More articles in Journal of Combinatorial Optimization from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().