A Learnheuristic Algorithm Based on Thompson Sampling for the Heterogeneous and Dynamic Team Orienteering Problem

Uguina, Antonio R.; Gomez, Juan F.; Panadero, Javier; Martínez-Gavara, Anna; Juan, Angel

A Learnheuristic Algorithm Based on Thompson Sampling for the Heterogeneous and Dynamic Team Orienteering Problem

Antonio R. Uguina, Juan F. Gomez, Javier Panadero, Anna Martínez-Gavara and Angel Juan
Additional contact information
Antonio R. Uguina: Research Center on Production Management and Engineering, Universitat Politècnica de València, 03801 Alcoy, Spain
Juan F. Gomez: Research Center on Production Management and Engineering, Universitat Politècnica de València, 03801 Alcoy, Spain
Javier Panadero: Department of Computer Architecture & Operating Systems, Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain
Anna Martínez-Gavara: Statistics and Operational Research Department, Universitat de València, Doctor Moliner, 50, Burjassot, 46100 València, Spain

Mathematics, 2024, vol. 12, issue 11, 1-19

Abstract: The team orienteering problem (TOP) is a well-studied optimization challenge in the field of Operations Research, where multiple vehicles aim to maximize the total collected rewards within a given time limit by visiting a subset of nodes in a network. With the goal of including dynamic and uncertain conditions inherent in real-world transportation scenarios, we introduce a novel dynamic variant of the TOP that considers real-time changes in environmental conditions affecting reward acquisition at each node. Specifically, we model the dynamic nature of environmental factors—such as traffic congestion, weather conditions, and battery level of each vehicle—to reflect their impact on the probability of obtaining the reward when visiting each type of node in a heterogeneous network. To address this problem, a learnheuristic optimization framework is proposed. It combines a metaheuristic algorithm with Thompson sampling to make informed decisions in dynamic environments. Furthermore, we conduct empirical experiments to assess the impact of varying reward probabilities on resource allocation and route planning within the context of this dynamic TOP, where nodes might offer a different reward behavior depending upon the environmental conditions. Our numerical results indicate that the proposed learnheuristic algorithm outperforms static approaches, achieving up to 25 % better performance in highly dynamic scenarios. Our findings highlight the effectiveness of our approach in adapting to dynamic conditions and optimizing decision-making processes in transportation systems.

Keywords: combinatorial optimization; team orienteering problem; reinforcement learning; learnheuristics (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/12/11/1758/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/11/1758/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:11:p:1758-:d:1409314

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().