Miss-Triggered Content Cache Replacement Under Partial Observability: Transformer-Decoder Q-Learning

Kim, Hakho; Sun, Teh-Jen; Huh, Eui-Nam

Miss-Triggered Content Cache Replacement Under Partial Observability: Transformer-Decoder Q-Learning

Hakho Kim, Teh-Jen Sun and Eui-Nam Huh ()
Additional contact information
Hakho Kim: Department of Artificial Intelligence, Kyung Hee University, Yongin 17104, Republic of Korea
Teh-Jen Sun: Department of Artificial Intelligence, Kyung Hee University, Yongin 17104, Republic of Korea
Eui-Nam Huh: Department of Computer Engineering, Kyung Hee University, Yongin 17104, Republic of Korea

Mathematics, 2025, vol. 13, issue 19, 1-27

Abstract: Content delivery networks (CDNs) face steadily rising, uneven demand, straining heuristic cache replacement. Reinforcement learning (RL) is promising, but most work assumes a fully observable Markov Decision Process (MDP), unrealistic under delayed, partial, and noisy signals. We model cache replacement as a Partially Observable MDP (POMDP) and present the Miss-Triggered Cache Transformer (MTCT), a Transformer-decoder Q-learning agent that encodes recent histories with self-attention. MTCT invokes its policy only on cache misses to align compute with informative events and uses a delayed-hit reward to propagate information from hits. A compact, rank-based action set (12 actions by default) captures popularity–recency trade-offs with complexity independent of cache capacity. We evaluate MTCT on a real trace (MovieLens) and two synthetic workloads (Mandelbrot–Zipf, Pareto) against Adaptive Replacement Cache (ARC), Windowed TinyLFU (W-TinyLFU), classical heuristics, and Double Deep Q-Network (DDQN). MTCT achieves the best or statistically comparable cache-hit rates on most cache sizes; e.g., on MovieLens at M = 600 , it reaches 0.4703 (DDQN 0.4436 , ARC 0.4513 ). Miss-triggered inference also lowers mean wall-clock time per episode; Transformer inference is well suited to modern hardware acceleration. Ablations support C L = 50 and show that finer action grids improve stability and final accuracy.

Keywords: deep reinforcement learning; content cache replacement; transformer; POMDP; cache replacement (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/19/3217/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/19/3217/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:19:p:3217-:d:1766066

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().