A reinforcement learning-based hyper-heuristic for AGV task assignment and route planning in parts-to-picker warehouses

Li, Kunpeng; Liu, Tengbo; Ram Kumar, P.N.; Han, Xuefang

A reinforcement learning-based hyper-heuristic for AGV task assignment and route planning in parts-to-picker warehouses

Kunpeng Li, Tengbo Liu, P.N. Ram Kumar and Xuefang Han

Transportation Research Part E: Logistics and Transportation Review, 2024, vol. 185, issue C

Abstract: Globally, e-commerce warehouses have begun implementing robotic mobile fulfillment systems (RMFS), which can improve order-picking efficiency by using automated guided vehicles (AGVs) to realize operations from parts to pickers. AGVs depart from their initial points, move to a target rack position, and subsequently transport racks to picking stations. The AGVs return the racks to their original positions after the workers pick them up. When all tasks are completed, the AGVs return to their starting point. In this context, the main challenge is the task assignment and route planning of multiple AGVs to minimize travel times. We formulate a mixed-integer linear programming (MILP) model with valid inequalities to solve small problem instances optimally. We introduce a reinforcement learning (RL)-based hyper-heuristic (HH) framework to solve large instances to near-optimality. A typical HH framework comprises two levels: high-level heuristics (HLH) and low-level heuristics (LLH). The framework starts from an initial solution and improves iteratively through LLHs, while the HLH invokes a selection strategy and an acceptance criterion to generate a new solution. We propose a novel selection strategy based on the improved Multi-Armed Bandits algorithm called Co-SLMAB and Exponential Monte Carlo with counters (EMCQ) as the acceptance criterion. The corresponding collision avoidance rules are then formulated for different conflicts to construct a conflict-free traveling route for AGVs. Besides testing the proposed framework’s effectiveness in real-life warehouse layouts, we perform extensive computational experiments and a thorough sensitivity analysis. The results show that (i) the proposed valid inequalities aid in obtaining better lower bounds and significantly speed up the solution process; (ii) the Co-SLMAB-HH framework is quite competitive compared to CPLEX, outperforming the other tested hyper-heuristics and the problem-specific heuristic regarding convergence and computation time; and (iii) a pool of LLHs consisting of a wide range of different operators is advantageous over a limited set of simple operators while solving problems using hyper-heuristics.

Keywords: Parts-to-picker picking system; Automated Guided Vehicles; Task scheduling; Reinforcement learning; Hyper-heuristic (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S1366554524001091
Full text for ScienceDirect subscribers only

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:transe:v:185:y:2024:i:c:s1366554524001091

Ordering information: This journal article can be ordered from
http://www.elsevier.com/wps/find/journaldescription.cws_home/600244/bibliographic
http://www.elsevier. ... 600244/bibliographic

DOI: 10.1016/j.tre.2024.103518

Access Statistics for this article

Transportation Research Part E: Logistics and Transportation Review is currently edited by W. Talley

More articles in Transportation Research Part E: Logistics and Transportation Review from Elsevier
Bibliographic data for series maintained by Catherine Liu ().