EconPapers    
Economics at your fingertips  
 

Navigating Data Corruption in Machine Learning: Balancing Quality, Quantity, and Imputation Strategies

Qi Liu () and Wanjing Ma
Additional contact information
Qi Liu: Key Laboratory of Road and Traffic Engineering of the Ministry of Education, College of Transportation, Tongji University, Shanghai 200092, China
Wanjing Ma: Key Laboratory of Road and Traffic Engineering of the Ministry of Education, College of Transportation, Tongji University, Shanghai 200092, China

Future Internet, 2025, vol. 17, issue 6, 1-21

Abstract: Data corruption, including missing and noisy entries, is a common challenge in real-world machine learning. This paper examines its impact and mitigation strategies through two experimental setups: supervised NLP tasks (NLP-SL) and deep reinforcement learning for traffic signal control (Signal-RL). This study analyzes how varying corruption levels affect model performance, evaluate imputation strategies, and assess whether expanding datasets can counteract corruption effects. The results indicate that performance degradation follows a diminishing-return pattern, well modeled by an exponential function. Noisy data harm performance more than missing data, especially in sequential tasks like Signal-RL where errors may compound. Imputation helps recover missing data but can introduce noise, with its effectiveness depending on corruption severity and imputation accuracy. This study identifies clear boundaries between when imputation is beneficial versus harmful, and classifies tasks as either noise-sensitive or noise-insensitive. Larger datasets reduce corruption effects but offer diminishing gains at high corruption levels. These insights guide the design of robust systems, emphasizing smart data collection, imputation decisions, and preprocessing strategies in noisy environments.

Keywords: data corruption; missing data; noisy data; data imputation; model robustness; deep reinforcement learning (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/1999-5903/17/6/241/pdf (application/pdf)
https://www.mdpi.com/1999-5903/17/6/241/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:17:y:2025:i:6:p:241-:d:1667785

Access Statistics for this article

Future Internet is currently edited by Ms. Grace You

More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-05-30
Handle: RePEc:gam:jftint:v:17:y:2025:i:6:p:241-:d:1667785