Data Quality Tools to Enhance a Network Anomaly Detection Benchmark
José Camacho () and
Rafael A. Rodríguez-Gómez
Additional contact information
José Camacho: Research Centre for Information and Communication Technologies (CITIC-UGR), University of Granada, 18014 Granada, Spain
Rafael A. Rodríguez-Gómez: Research Centre for Information and Communication Technologies (CITIC-UGR), University of Granada, 18014 Granada, Spain
Data, 2025, vol. 10, issue 3, 1-15
Abstract:
Network traffic datasets are essential for the construction of traffic models, often using machine learning (ML) techniques. Among other applications, these models can be employed to solve complex optimization problems or to identify anomalous behaviors, i.e., behaviors that deviate from the established model. However, the performance of the ML model depends, among other factors, on the quality of the data used to train it. Benchmark datasets, with a profound impact on research findings, are often assumed to be of good quality by default. In this paper, we derive four variants of a benchmark dataset in network anomaly detection (UGR’16, a flow-based real-world traffic dataset designed for anomaly detection), and show that the choice among variants has a larger impact on model performance than the ML technique used to build the model. To analyze this phenomenon, we propose a methodology to investigate the causes of these differences and to assess the quality of the data labeling. Our results underline the importance of paying more attention to data quality assessment in network anomaly detection.
Keywords: Netflow; UGR’16; anomaly detection; data quality (search for similar items in EconPapers)
JEL-codes: C8 C80 C81 C82 C83 (search for similar items in EconPapers)
Date: 2025
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2306-5729/10/3/33/pdf (application/pdf)
https://www.mdpi.com/2306-5729/10/3/33/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jdataj:v:10:y:2025:i:3:p:33-:d:1599152
Access Statistics for this article
Data is currently edited by Ms. Cecilia Yang
More articles in Data from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().