Learning State-Dependent Policy Parametrizations for Dynamic Technician Routing with Rework
Jonas Stein (),
Florentin D. Hildebrandt (),
Marlin W. Ulmer () and
Barrett W. Thomas ()
Additional contact information
Jonas Stein: Faculty of Economics and Management, Management Science, Otto-von-Guericke-Universität Magdeburg, 39106 Magdeburg, Germany
Florentin D. Hildebrandt: Faculty of Economics and Management, Management Science, Otto-von-Guericke-Universität Magdeburg, 39106 Magdeburg, Germany
Marlin W. Ulmer: Faculty of Economics and Management, Management Science, Otto-von-Guericke-Universität Magdeburg, 39106 Magdeburg, Germany
Barrett W. Thomas: Department of Business Analytics, Tippie College of Business, University of Iowa, Iowa City, Iowa 52242
Transportation Science, 2025, vol. 59, issue 5, 1153-1171
Abstract:
Home repair and installation services require technicians to visit customers and resolve tasks of different complexities. Technicians often have heterogeneous skills. The geographical spread of customers makes achieving only “ideal” matches between technician skills and task requirements impractical. Additionally, technicians are regularly absent, for example, due to sickness. With only nonideal assignments regarding task requirement and technician skill, some tasks may remain unresolved and require a revisit and rework at a later day, leading to delayed service. For this sequential decision problem, every day, we iteratively build tours by adding “important” customers. The importance bases on analytical considerations and is measured by respecting urgency of service, routing efficiency, and risk of rework in an integrated fashion. We propose a state-dependent balance of these factors via reinforcement learning. We rely on proximal policy optimization (PPO) tailored to the problem specifics, analyzing the implications of specific algorithmic augmentations. A comprehensive study shows that taking a few nonideal assignments can be quite beneficial for the overall service quality. Furthermore, in states where a higher number of technicians are sick and many customers have overdue service deadlines, prioritizing service urgency is crucial. Conversely, in states with fewer sick technicians and fewer customers with overdue deadlines, routing efficiency should take precedence. We further demonstrate the value provided by a state-dependent parametrization via PPO.
Keywords: stochastic dynamic technician routing; rework uncertainty; sequential decision making; reinforcement learning (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
http://dx.doi.org/10.1287/trsc.2024.0844 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:ortrsc:v:59:y:2025:i:5:p:1153-1171
Access Statistics for this article
More articles in Transportation Science from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().