EconPapers    
Economics at your fingertips  
 

Design and Evaluation of Unsupervised Machine Learning Models for Anomaly Detection in Streaming Cybersecurity Logs

Carmen Sánchez-Zas (), Xavier Larriva-Novo, Víctor A. Villagrá, Mario Sanz Rodrigo and José Ignacio Moreno
Additional contact information
Carmen Sánchez-Zas: ETSI Telecomunicación, Universidad Politécnica de Madrid (UPM), Avda. Complutense 30, 28040 Madrid, Spain
Xavier Larriva-Novo: ETSI Telecomunicación, Universidad Politécnica de Madrid (UPM), Avda. Complutense 30, 28040 Madrid, Spain
Víctor A. Villagrá: ETSI Telecomunicación, Universidad Politécnica de Madrid (UPM), Avda. Complutense 30, 28040 Madrid, Spain
Mario Sanz Rodrigo: ETSI Telecomunicación, Universidad Politécnica de Madrid (UPM), Avda. Complutense 30, 28040 Madrid, Spain
José Ignacio Moreno: ETSI Telecomunicación, Universidad Politécnica de Madrid (UPM), Avda. Complutense 30, 28040 Madrid, Spain

Mathematics, 2022, vol. 10, issue 21, 1-30

Abstract: Companies, institutions or governments process large amounts of data for the development of their activities. This knowledge usually comes from devices that collect data from various sources. Processing them in real time is essential to ensure the flow of information about the current state of infrastructure, as this knowledge is the basis for management and decision making in the event of an attack or anomalous situations. Therefore, this article exposes three unsupervised machine learning models based on clustering techniques and threshold definitions to detect anomalies from heterogeneous streaming cybersecurity data sources. After evaluation, this paper presents a case of heterogeneous cybersecurity devices, comparing WSSSE, Silhouette and training time metrics for all models, where K-Means was defined as the optimal algorithm for anomaly detection in streaming data processing. The anomaly detection’s accuracy achieved is also significantly high. A comparison with other research studies is also performed, against which the proposed method proved its strong points.

Keywords: machine learning; clustering; real-time; data pre-processing; threshold; Spark; cybersecurity; K-means; anomaly detection; logs (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/10/21/4043/pdf (application/pdf)
https://www.mdpi.com/2227-7390/10/21/4043/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:10:y:2022:i:21:p:4043-:d:958802

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jmathe:v:10:y:2022:i:21:p:4043-:d:958802