EconPapers    
Economics at your fingertips  
 

Data Quality Tools to Enhance a Network Anomaly Detection Benchmark

José Camacho () and Rafael A. Rodríguez-Gómez
Additional contact information
José Camacho: Research Centre for Information and Communication Technologies (CITIC-UGR), University of Granada, 18014 Granada, Spain
Rafael A. Rodríguez-Gómez: Research Centre for Information and Communication Technologies (CITIC-UGR), University of Granada, 18014 Granada, Spain

Data, 2025, vol. 10, issue 3, 1-15

Abstract: Network traffic datasets are essential for the construction of traffic models, often using machine learning (ML) techniques. Among other applications, these models can be employed to solve complex optimization problems or to identify anomalous behaviors, i.e., behaviors that deviate from the established model. However, the performance of the ML model depends, among other factors, on the quality of the data used to train it. Benchmark datasets, with a profound impact on research findings, are often assumed to be of good quality by default. In this paper, we derive four variants of a benchmark dataset in network anomaly detection (UGR’16, a flow-based real-world traffic dataset designed for anomaly detection), and show that the choice among variants has a larger impact on model performance than the ML technique used to build the model. To analyze this phenomenon, we propose a methodology to investigate the causes of these differences and to assess the quality of the data labeling. Our results underline the importance of paying more attention to data quality assessment in network anomaly detection.

Keywords: Netflow; UGR’16; anomaly detection; data quality (search for similar items in EconPapers)
JEL-codes: C8 C80 C81 C82 C83 (search for similar items in EconPapers)
Date: 2025
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2306-5729/10/3/33/pdf (application/pdf)
https://www.mdpi.com/2306-5729/10/3/33/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jdataj:v:10:y:2025:i:3:p:33-:d:1599152

Access Statistics for this article

Data is currently edited by Ms. Cecilia Yang

More articles in Data from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-22
Handle: RePEc:gam:jdataj:v:10:y:2025:i:3:p:33-:d:1599152