Machine learning-based imputation soft computing approach for large missing scale and non-reference data imputation
A.H. Alamoodi,
B.B. Zaidan,
A.A. Zaidan,
O.S. Albahri,
Juliana Chen,
M.A. Chyad,
Salem Garfan and
A.M. Aleesa
Chaos, Solitons & Fractals, 2021, vol. 151, issue C
Abstract:
Missing data is a common problem in real-world data sets and it is amongst the most complex topics in computer science and many other research domains. The common ways to cope with missing values are either by elimination or imputation depending of the volume of the missing data and its distribution nature. It becomes imperative to come up with new imputation approaches along with efficient algorithms. Though most existing imputation methods focus on a moderate amount of missing data, imputation for high missing rates over 80% is still important but challenging. Even with the existence of some works in addressing high missing volume issue, they mostly rely on imputing reference dataset (Complete Datasets for evaluation) after they create artificial missing values and impute it to measure the accuracy of their proposed techniques. So far, the option of imputing high proportions of missing values with no reference comparison dataset (Original Dataset with highly missing values) have been often ignored or overlooked. Therefore, we propose a missing data imputation approach for high volumes of missing values with no reference comparison dataset. The approach makes use of pre-processing measures and breaking the dataset into small continuous non-missing portions then using Multi Criteria Decision-making analysis to select a portion of data which is representative of the entire broken datasets. This portion helps to create reference comparisons and expands the missing dataset through artificial missing-making procedures with different percentages and imputation using different machine learning techniques. This study conducted two experiments using BMI datasets with more than 80% of missing values, derived from the National Child Development Centre (NCDRC) at Sultan Idris Education University (UPSI), Malaysia. The results show that our approach capability in reconstructing datasets with huge missing values.
Keywords: Missing data; Missing values; Imputation; Pre-processing; Large missing (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0960077921005907
Full text for ScienceDirect subscribers only
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:chsofr:v:151:y:2021:i:c:s0960077921005907
DOI: 10.1016/j.chaos.2021.111236
Access Statistics for this article
Chaos, Solitons & Fractals is currently edited by Stefano Boccaletti and Stelios Bekiros
More articles in Chaos, Solitons & Fractals from Elsevier
Bibliographic data for series maintained by Thayer, Thomas R. ().