EconPapers    
Economics at your fingertips  
 

Assessing the impact of bag‐of‐words versus word‐to‐vector embedding methods and dimension reduction on anomaly detection from log files

Ziyu Qiu, Zhilei Zhou, Bradley Niblett, Andrew Johnston, Jeffrey Schwartzentruber, Nur Zincir‐Heywood and Malcolm I. Heywood

International Journal of Network Management, 2024, vol. 34, issue 1

Abstract: In terms of cyber security, log files represent a rich source of information regarding the state of a computer service/system. Automating the process of summarizing log file content represents an important aid for decision‐making, especially given the 24/7 nature of network/service operations. We perform benchmarking over eight distinct log files in order to assess the impact of the following: (1) different embedding methods for developing semantic descriptions of the original log files, (2) applying dimension reduction to the high‐dimensional semantic space, and (3) assessing the impact of using different unsupervised learning algorithms for providing a visual summary of the service state. Benchmarking demonstrates that (1) word‐to‐vector embeddings identified by bidirectional encoder representation from transformers (BERT) without “fine‐tuning” are sufficient to match the performance of Bag‐or‐Words embeddings provided by term frequency‐inverse document frequency (TF‐IDF) and (2) the self‐organizing map without dimension reduction provides the most effective anomaly detector.

Date: 2024
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1002/nem.2251

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:wly:intnem:v:34:y:2024:i:1:n:e2251

Access Statistics for this article

More articles in International Journal of Network Management from John Wiley & Sons
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-04-06
Handle: RePEc:wly:intnem:v:34:y:2024:i:1:n:e2251