EconPapers    
Economics at your fingertips  
 

Data Imbalance in Autism Pre-Diagnosis Classification Systems: An Experimental Study

Neda Abdelhamid (), Arun Padmavathy (), David Peebles (), Fadi Thabtah () and Daymond Goulder-Horobin ()
Additional contact information
Neda Abdelhamid: IT Programme, Auckland Institute of Studies, Auckland, New Zealand
Arun Padmavathy: Digital Technologies, Manukau Institute of Technology, Auckland, New Zealand
David Peebles: Department of Psychology, University of Huddersfield, Queensgate, Huddersfield HD1 3DH, UK
Fadi Thabtah: Digital Technologies, Manukau Institute of Technology, Auckland, New Zealand
Daymond Goulder-Horobin: Digital Technologies, Manukau Institute of Technology, Auckland, New Zealand

Journal of Information & Knowledge Management (JIKM), 2020, vol. 19, issue 01, 1-16

Abstract: Machine learning (ML) is a branch of computer science that is rapidly gaining popularity within the healthcare arena due to its ability to explore large datasets to discover useful patterns that can be interepreted for decision-making and prediction. ML techniques are used for the analysis of clinical parameters and their combinations for prognosis, therapy planning and support and patient management and wellbeing. In this research, we investigate a crucial problem associated with medical applications such as autism spectrum disorder (ASD) data imbalances in which cases are far more than just controls in the dataset. In autism diagnosis data, the number of possible instances is linked with one class, i.e. the no ASD is larger than the ASD, and this may cause performance issues such as models favouring the majority class and undermining the minority class. This research experimentally measures the impact of class imbalance issue on the performance of different classifiers on real autism datasets when various data imbalance approaches are utilised in the pre-processing phase. We employ oversampling techniques, such as Synthetic Minority Oversampling (SMOTE), and undersampling with different classifiers including Naive Bayes, RIPPER, C4.5 and Random Forest to measure the impact of these on the performance of the models derived in terms of area under curve and other metrics. Results pinpoint that oversampling techniques are superior to undersampling techniques, at least for the toddlers’ autism dataset that we consider, and suggest that further work should look at incorporating sampling techniques with feature selection to generate models that do not overfit the dataset.

Keywords: Autism spectrum disorder; ASD screening; data imbalance; machine learning; undersampling; oversampling; SMOTE (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (16)

Downloads: (external link)
https://www.worldscientific.com/doi/abs/10.1142/S0219649220400146
Access to full text is restricted to subscribers

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:wsi:jikmxx:v:19:y:2020:i:01:n:s0219649220400146

Ordering information: This journal article can be ordered from

DOI: 10.1142/S0219649220400146

Access Statistics for this article

Journal of Information & Knowledge Management (JIKM) is currently edited by Professor Suliman Hawamdeh

More articles in Journal of Information & Knowledge Management (JIKM) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().

 
Page updated 2025-03-20
Handle: RePEc:wsi:jikmxx:v:19:y:2020:i:01:n:s0219649220400146