Adversarial Approaches to Tackle Imbalanced Data in Machine Learning

Ayoub, Shahnawaz; Gulzar, Yonis; Rustamov, Jaloliddin; Jabbari, Abdoh; Reegu, Faheem Ahmad; Turaev, Sherzod

Adversarial Approaches to Tackle Imbalanced Data in Machine Learning

Shahnawaz Ayoub, Yonis Gulzar (), Jaloliddin Rustamov, Abdoh Jabbari, Faheem Ahmad Reegu and Sherzod Turaev ()
Additional contact information
Shahnawaz Ayoub: Department of Computer Science and Engineering, Shri Venkateshwara University, NH-24, Venkateshwara Nagar, Gajraula 244236, Uttar Pradesh, India
Yonis Gulzar: Department of Management Information Systems, College of Business Administration, King Faisal University, Al-Ahsa 31982, Saudi Arabia
Jaloliddin Rustamov: Health Data Science Lab, Department of Genetics and Genomics, College of Medicine and Health Sciences, United Arab Emirates University, Al Ain 15551, United Arab Emirates
Abdoh Jabbari: Department of Computer Science and Information Technology, Jazan University, Jazan 45142, Saudi Arabia
Faheem Ahmad Reegu: Department of Computer Science and Information Technology, Jazan University, Jazan 45142, Saudi Arabia
Sherzod Turaev: Department of Computer Science & Software Engineering, College of Information Technology, United Arab Emirates University, Al Ain 15551, United Arab Emirates

Sustainability, 2023, vol. 15, issue 9, 1-17

Abstract: Real-world applications often involve imbalanced datasets, which have different distributions of examples across various classes. When building a system that requires a high accuracy, the performance of the classifiers is crucial. However, imbalanced datasets can lead to a poor classification performance and conventional techniques, such as synthetic minority oversampling technique. As a result, this study proposed a balance between the datasets using adversarial learning methods such as generative adversarial networks. The model evaluated the effect of data augmentation on both the balanced and imbalanced datasets. The study evaluated the classification performance on three different datasets and applied data augmentation techniques to generate the synthetic data for the minority class. Before the augmentation, a decision tree was applied to identify the classification accuracy of all three datasets. The obtained classification accuracies were 79.9%, 94.1%, and 72.6%. A decision tree was used to evaluate the performance of the data augmentation, and the results showed that the proposed model achieved an accuracy of 82.7%, 95.7%, and 76% on a highly imbalanced dataset. This study demonstrates the potential of using data augmentation to improve the classification performance in imbalanced datasets.

Keywords: computer vision; machine learning; deep learning; imbalanced dataset (search for similar items in EconPapers)
JEL-codes: O13 Q Q0 Q2 Q3 Q5 Q56 (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2071-1050/15/9/7097/pdf (application/pdf)
https://www.mdpi.com/2071-1050/15/9/7097/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jsusta:v:15:y:2023:i:9:p:7097-:d:1131143

Access Statistics for this article

Sustainability is currently edited by Ms. Alexandra Wu

More articles in Sustainability from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().