EconPapers    
Economics at your fingertips  
 

Model Retraining upon Concept Drift Detection in Network Traffic Big Data

Sikha S. Bagui (), Mohammad Pale Khan, Chedlyne Valmyr, Subhash C. Bagui and Dustin Mink
Additional contact information
Sikha S. Bagui: Department of Computer Science, The University of West Florida, Pensacola, FL 32514, USA
Mohammad Pale Khan: Department of Computer Science, The University of West Florida, Pensacola, FL 32514, USA
Chedlyne Valmyr: Department of Computer Science, The University of West Florida, Pensacola, FL 32514, USA
Subhash C. Bagui: Department of Mathematics and Statistics, The University of West Florida, Pensacola, FL 32514, USA
Dustin Mink: Department of Cybersecurity, The University of West Florida, Pensacola, FL 32514, USA

Future Internet, 2025, vol. 17, issue 8, 1-23

Abstract: This paper presents a comprehensive model for detecting and addressing concept drift in network security data using the Isolation Forest algorithm. The approach leverages Isolation Forest’s inherent ability to efficiently isolate anomalies in high-dimensional data, making it suitable for adapting to shifting data distributions in dynamic environments.Anomalies in network attack data may not occur in large numbers, so it is important to be able to detect anomalies even with small batch sizes. The novelty of this work lies in successfully detecting anomalies even with small batch sizes and identifying the point at which incremental retraining needs to be started. Triggering retraining early also keeps the model in sync with the latest data, reducing the chance for attacks to be successfully conducted. Our methodology implements an end-to-end workflow that continuously monitors incoming data and detects distribution changes using Isolation Forest, then manages model retraining using Random Forest to maintain optimal performance. We evaluate our approach using UWF-ZeekDataFall22, a newly created dataset that analyzes Zeek’s Connection Logs collected through Security Onion 2 network security monitor and labeled using the MITRE ATT&CK framework. Incremental as well as full retraining are analyzed using Random Forest. There was a steady increase in the model’s performance with incremental retraining and a positive impact on the model’s performance with full model retraining.

Keywords: concept drift; isolation forest; anomaly detection; network security; machine learning; data streaming; big data analytics; model retraining; drift detection; network traffic analysis (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/1999-5903/17/8/328/pdf (application/pdf)
https://www.mdpi.com/1999-5903/17/8/328/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:17:y:2025:i:8:p:328-:d:1709399

Access Statistics for this article

Future Internet is currently edited by Ms. Grace You

More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-07-25
Handle: RePEc:gam:jftint:v:17:y:2025:i:8:p:328-:d:1709399