A Multi-Depot Dynamic Vehicle Routing Problem with Stochastic Road Capacity: An MDP Model and Dynamic Policy for Post-Decision State Rollout Algorithm in Reinforcement Learning
Wadi Khalid Anuar,
Lai Soon Lee,
Hsin-Vonn Seow and
Stefan Pickl
Additional contact information
Wadi Khalid Anuar: Department of Logistics and Transportation, School of Technology Management and Logistics, Universiti Utara Malaysia, Sintok 06010, Kedah, Malaysia
Lai Soon Lee: Laboratory of Computational Statistics and Operations Research, Institute for Mathematical Research, Universiti Putra Malaysia, Serdang 43400, Selangor, Malaysia
Hsin-Vonn Seow: Faculty of Arts and Social Sciences, Nottingham University Business School, University of Nottingham Malaysia Campus, Semenyih 43500, Selangor, Malaysia
Stefan Pickl: Fakultät für Informatik, Universität der Bundeswehr München, 85577 Neubiberg, Germany
Mathematics, 2022, vol. 10, issue 15, 1-70
Abstract:
In the event of a disaster, the road network is often compromised in terms of its capacity and usability conditions. This is a challenge for humanitarian operations in the context of delivering critical medical supplies. To optimise vehicle routing for such a problem, a Multi-Depot Dynamic Vehicle-Routing Problem with Stochastic Road Capacity (MDDVRPSRC) is formulated as a Markov Decision Processes (MDP) model. An Approximate Dynamic Programming (ADP) solution method is adopted where the Post-Decision State Rollout Algorithm (PDS-RA) is applied as the lookahead approach. To perform the rollout effectively for the problem, the PDS-RA is executed for all vehicles assigned for the problem. Then, at the end, a decision is made by the agent. Five types of constructive base heuristics are proposed for the PDS-RA. First, the Teach Base Insertion Heuristic (TBIH-1) is proposed to study the partial random construction approach for the non-obvious decision. The heuristic is extended by proposing TBIH-2 and TBIH-3 to show how Sequential Insertion Heuristic (SIH) (I1) as well as Clarke and Wright (CW) could be executed, respectively, in a dynamic setting as a modification to the TBIH-1. Additionally, another two heuristics: TBIH-4 and TBIH-5 (TBIH-1 with the addition of Dynamic Lookahead SIH (DLASIH) and Dynamic Lookahead CW (DLACW) respectively) are proposed to improve the on-the-go constructed decision rule (dynamic policy on the go) in the lookahead simulations. The results obtained are compared with the matheuristic approach from previous work based on PDS-RA.
Keywords: reinforcement learning; Markov decision processes; approximate dynamic programming; rollout algorithm; constructive base heuristic; vehicle routing problem (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://www.mdpi.com/2227-7390/10/15/2699/pdf (application/pdf)
https://www.mdpi.com/2227-7390/10/15/2699/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:10:y:2022:i:15:p:2699-:d:876208
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().