A Scalable Reinforcement Learning Framework for Ultra-Reliable Low-Latency Spectrum Management in Healthcare Internet of Things

Iqbal, Adeel; Nauman, Ali; Khurshaid, Tahir; Rhee, Sang-Bong

A Scalable Reinforcement Learning Framework for Ultra-Reliable Low-Latency Spectrum Management in Healthcare Internet of Things

Adeel Iqbal, Ali Nauman (), Tahir Khurshaid and Sang-Bong Rhee
Additional contact information
Adeel Iqbal: School of Computer Science and Engineering, Yeungnam University, Gyeongsan-si 38541, Republic of Korea
Ali Nauman: School of Computer Science and Engineering, Yeungnam University, Gyeongsan-si 38541, Republic of Korea
Tahir Khurshaid: Department of Electrical Engineering, Yeungnam University, Gyeongsan-si 38541, Republic of Korea
Sang-Bong Rhee: Department of Electrical Engineering, Yeungnam University, Gyeongsan-si 38541, Republic of Korea

Mathematics, 2025, vol. 13, issue 18, 1-27

Abstract: Healthcare Internet of Things (H-IoT) systems demand ultra-reliable and low-latency communication (URLLC) to support critical functions such as remote monitoring, emergency response, and real-time diagnostics. However, spectrum scarcity and heterogeneous traffic patterns pose major challenges for centralized scheduling in dense H-IoT deployments. This paper proposed a multi-agent reinforcement learning (MARL) framework for dynamic, priority-aware spectrum management (PASM), where cooperative MARL agents jointly optimize throughput, latency, energy efficiency, fairness, and blocking probability under varying traffic and channel conditions. Six learning strategies are developed and compared, including Q-Learning, Double Q-Learning, Deep Q-Network (DQN), Actor–Critic, Dueling DQN, and Proximal Policy Optimization (PPO), within a simulated H-IoT environment that captures heterogeneous traffic, device priorities, and realistic URLLC constraints. A comprehensive simulation study across scalable scenarios ranging from 3 to 50 devices demonstrated that PPO consistently outperforms all baselines, improving mean throughput by 6.2 % , reducing 95th-percentile delay by 11.5 % , increasing energy efficiency by 11.9 % , lowering blocking probability by 33.3 % , and accelerating convergence by 75.8 % compared to the strongest non-PPO baseline. These findings establish PPO as a robust and scalable solution for QoS-compliant spectrum management in dense H-IoT environments, while Dueling DQN emerges as a competitive deep RL alternative.

Keywords: 5G; internet of things; priority-aware spectrum management; reinforcement learning; spectrum access; resource allocation (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/18/2941/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/18/2941/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:18:p:2941-:d:1747174

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().