EconPapers    
Economics at your fingertips  
 

Employing Streaming Machine Learning for Modeling Workload Patterns in Multi-Tiered Data Storage Systems

Edson Ramiro Lucas Filho, George Savva, Lun Yang, Kebo Fu, Jianqiang Shen and Herodotos Herodotou ()
Additional contact information
Edson Ramiro Lucas Filho: Department of Electrical Engineering and Computer Engineering and Informatics, Cyprus University of Technology, Limassol 3036, Cyprus
George Savva: Department of Electrical Engineering and Computer Engineering and Informatics, Cyprus University of Technology, Limassol 3036, Cyprus
Lun Yang: Huawei Technologies Co., Ltd., Shenzhen 518100, China
Kebo Fu: Huawei Technologies Co., Ltd., Shenzhen 518100, China
Jianqiang Shen: Huawei Technologies Co., Ltd., Shenzhen 518100, China
Herodotos Herodotou: Department of Electrical Engineering and Computer Engineering and Informatics, Cyprus University of Technology, Limassol 3036, Cyprus

Future Internet, 2025, vol. 17, issue 4, 1-37

Abstract: Modern multi-tiered data storage systems optimize file access by managing data across a hybrid composition of caches and storage tiers while using policies whose decisions can severely impact the storage system’s performance. Recently, different Machine-Learning (ML) algorithms have been used to model access patterns from complex workloads. Yet, current approaches train their models offline in a batch-based approach, even though storage systems are processing a stream of file requests with dynamic workloads. In this manuscript, we advocate the streaming ML paradigm for modeling access patterns in multi-tiered storage systems as it introduces various advantages, including high efficiency, high accuracy, and high adaptability. Moreover, representative file access patterns, including temporal, spatial, length, and frequency patterns, are identified for individual files, directories, and file formats, and used as features. Streaming ML models are developed, trained, and tested on different file system traces for making two types of predictions: the next offset to be read in a file and the future file hotness. An extensive evaluation is performed with production traces provided by Huawei Technologies, showing that the models are practical, with low memory consumption (<1.3 MB) and low training delay (<1.8 ms per training instance), and can make accurate predictions online (0.98 F1 score and 0.07 MAE on average).

Keywords: multi-tiered data storage systems; streaming machine learning; workload patterns (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/1999-5903/17/4/170/pdf (application/pdf)
https://www.mdpi.com/1999-5903/17/4/170/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:17:y:2025:i:4:p:170-:d:1633050

Access Statistics for this article

Future Internet is currently edited by Ms. Grace You

More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-04-12
Handle: RePEc:gam:jftint:v:17:y:2025:i:4:p:170-:d:1633050