The Effect of Preprocessing Techniques, Applied to Numeric Features, on Classification Algorithms’ Performance
Esra’a Alshdaifat,
Doa’a Alshdaifat,
Ayoub Alsarhan,
Fairouz Hussein and
Subhieh Moh’d Faraj S. El-Salhi
Additional contact information
Esra’a Alshdaifat: Department of Computer Information System, Faculty of Prince Al-Hussein Bin Abdallah II For Information Technology, The Hashemite University, P.O. Box 330127, Zarqa 13133, Jordan
Doa’a Alshdaifat: Department of Computer Information System, Faculty of Prince Al-Hussein Bin Abdallah II For Information Technology, The Hashemite University, P.O. Box 330127, Zarqa 13133, Jordan
Ayoub Alsarhan: Department of Computer Information System, Faculty of Prince Al-Hussein Bin Abdallah II For Information Technology, The Hashemite University, P.O. Box 330127, Zarqa 13133, Jordan
Fairouz Hussein: Department of Computer Information System, Faculty of Prince Al-Hussein Bin Abdallah II For Information Technology, The Hashemite University, P.O. Box 330127, Zarqa 13133, Jordan
Subhieh Moh’d Faraj S. El-Salhi: Department of Computer Information System, Faculty of Prince Al-Hussein Bin Abdallah II For Information Technology, The Hashemite University, P.O. Box 330127, Zarqa 13133, Jordan
Data, 2021, vol. 6, issue 2, 1-23
Abstract:
It is recognized that the performance of any prediction model is a function of several factors. One of the most significant factors is the adopted preprocessing techniques. In other words, preprocessing is an essential process to generate an effective and efficient classification model. This paper investigates the impact of the most widely used preprocessing techniques, with respect to numerical features, on the performance of classification algorithms. The effect of combining various normalization techniques and handling missing values strategies is assessed on eighteen benchmark datasets using two well-known classification algorithms and adopting different performance evaluation metrics and statistical significance tests. According to the reported experimental results, the impact of the adopted preprocessing techniques varies from one classification algorithm to another. In addition, a statistically significant difference between the considered data preprocessing techniques is demonstrated.
Keywords: preprocessing; classification algorithms; normalization; missing values; classification performance; data cleaning (search for similar items in EconPapers)
JEL-codes: C8 C80 C81 C82 C83 (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (4)
Downloads: (external link)
https://www.mdpi.com/2306-5729/6/2/11/pdf (application/pdf)
https://www.mdpi.com/2306-5729/6/2/11/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jdataj:v:6:y:2021:i:2:p:11-:d:484845
Access Statistics for this article
Data is currently edited by Ms. Cecilia Yang
More articles in Data from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().