Adaptive Cyber Defense Through Hybrid Learning: From Specialization to Generalization
Muhammad Omer Farooq ()
Additional contact information
Muhammad Omer Farooq: HSR Innovations and Consulting Ltd., Old Quarters Ballincollig, P31 EV91 Cork, Ireland
Future Internet, 2025, vol. 17, issue 10, 1-17
Abstract:
This paper introduces a hybrid learning framework that synergistically combines Reinforcement Learning (RL) and Supervised Learning (SL) to train autonomous cyber-defense agents capable of operating effectively in dynamic and adversarial environments. The proposed approach leverages RL for strategic exploration and policy development, while incorporating SL to distill high-reward trajectories into refined policy updates, enhancing sample efficiency, learning stability, and robustness. The framework first targets specialized agent training, where each agent is optimized against a specific adversarial behavior. Subsequently, it is extended to enable the training of a generalized agent that learns to counter multiple, diverse attack strategies through multi-task and curriculum learning techniques. Comprehensive experiments conducted in the CybORG simulation environment demonstrate that the hybrid RL–SL framework consistently outperforms pure RL baselines across both specialized and generalized settings, achieving higher cumulative rewards. Specifically, hybrid-trained agents achieve up to 23% higher cumulative rewards in specialized defense tasks and approximately 18% improvements in generalized defense scenarios compared to RL-only agents. Moreover, incorporating temporal context into the observation space yields a further 4–6% performance gain in policy robustness. Furthermore, we investigate the impact of augmenting the observation space with historical actions and rewards, revealing consistent, albeit incremental, gains in SL-based learning performance. Key contributions of this work include: ( i ) a novel hybrid learning paradigm that integrates RL and SL for effective cyber-defense policy learning, ( i i ) a scalable extension for training generalized agents across heterogeneous threat models, and ( i i i ) empirical analysis on the role of temporal context in agent observability and decision-making. Collectively, the results highlight the promise of hybrid learning strategies for building intelligent, resilient, and adaptable cyber-defense systems in evolving threat landscapes.
Keywords: autonomous cyber operations; cyber security; defensive blue agent; supervised learning; reinforcement learning (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/1999-5903/17/10/464/pdf (application/pdf)
https://www.mdpi.com/1999-5903/17/10/464/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:17:y:2025:i:10:p:464-:d:1767507
Access Statistics for this article
Future Internet is currently edited by Ms. Grace You
More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().