Online model-based reinforcement learning for decision-making in long distance routes

Alcaraz, Juan J.; Losilla, Fernando; Caballero-Arnaldos, Luis

Online model-based reinforcement learning for decision-making in long distance routes

Juan J. Alcaraz, Fernando Losilla and Luis Caballero-Arnaldos

Transportation Research Part E: Logistics and Transportation Review, 2022, vol. 164, issue C

Abstract: In road transportation, long-distance routes require scheduled driving times, breaks, and rest periods, in compliance with the regulations on working conditions for truck drivers, while ensuring goods are delivered within the time windows of each customer. However, routes are subject to uncertain travel and service times, and incidents may cause additional delays, making predefined schedules ineffective in many real-life situations. This paper presents a reinforcement learning (RL) algorithm capable of making en-route decisions regarding driving times, breaks, and rest periods, under uncertain conditions. Our proposal aims at maximizing the likelihood of on-time delivery while complying with drivers’ work regulations. We use an online model-based RL strategy that needs no prior training and is more flexible than model-free RL approaches, where the agent must be trained offline before making online decisions. Our proposal combines model predictive control with a rollout strategy and Monte Carlo tree search. At each decision stage, our algorithm anticipates the consequences of all the possible decisions in a number of future stages (the lookahead horizon), and then uses a base policy to generate a sequence of decisions beyond the lookahead horizon. This base policy could be, for example, a set of decision rules based on the experience and expertise of the transportation company covering the routes. Our numerical results show that the policy obtained using our algorithm outperforms not only the base policy (up to 83%), but also a policy obtained offline using deep Q networks (DQN), a state-of-the-art, model-free RL algorithm.

Keywords: Route scheduling; Reinforcement learning; Model predictive control; Monte Carlo tree search (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S136655452200179X
Full text for ScienceDirect subscribers only

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:transe:v:164:y:2022:i:c:s136655452200179x

Ordering information: This journal article can be ordered from
http://www.elsevier.com/wps/find/journaldescription.cws_home/600244/bibliographic
http://www.elsevier. ... 600244/bibliographic

DOI: 10.1016/j.tre.2022.102790

Access Statistics for this article

Transportation Research Part E: Logistics and Transportation Review is currently edited by W. Talley

More articles in Transportation Research Part E: Logistics and Transportation Review from Elsevier
Bibliographic data for series maintained by Catherine Liu ().