A Two-Step Data Normalization Approach for Improving Classification Accuracy in the Medical Diagnosis Domain

Izonin, Ivan; Tkachenko, Roman; Shakhovska, Nataliya; Ilchyshyn, Bohdan; Singh, Krishna Kant

A Two-Step Data Normalization Approach for Improving Classification Accuracy in the Medical Diagnosis Domain

Ivan Izonin, Roman Tkachenko, Nataliya Shakhovska, Bohdan Ilchyshyn and Krishna Kant Singh
Additional contact information
Ivan Izonin: Department of Artificial Intelligence, Lviv Polytechnic National University, 79013 Lviv, Ukraine
Roman Tkachenko: Department of Publishing Information Technologies, Lviv Polytechnic National University, 79013 Lviv, Ukraine
Nataliya Shakhovska: Department of Artificial Intelligence, Lviv Polytechnic National University, 79013 Lviv, Ukraine
Bohdan Ilchyshyn: Department of Artificial Intelligence, Lviv Polytechnic National University, 79013 Lviv, Ukraine
Krishna Kant Singh: Department of Computer Science and Engineering, Jain (Deemed to Be University), Bangalore 560069, India

Mathematics, 2022, vol. 10, issue 11, 1-18

Abstract: Data normalization is a data preprocessing task and one of the first to be performed during intellectual analysis, particularly in the case of tabular data. The importance of its implementation is determined by the need to reduce the sensitivity of the artificial intelligence model to the values of the features in the dataset to increase the studied model’s adequacy. This paper focuses on the problem of effectively preprocessing data to improve the accuracy of intellectual analysis in the case of performing medical diagnostic tasks. We developed a new two-step method for data normalization of numerical medical datasets. It is based on the possibility of considering both the interdependencies between the features of each observation from the dataset and their absolute values to improve the accuracy when performing medical data mining tasks. We describe and substantiate each step of the algorithmic implementation of the method. We also visualize the results of the proposed method. The proposed method was modeled using six different machine learning methods based on decision trees when performing binary and multiclass classification tasks. We used six real-world, freely available medical datasets with different numbers of vectors, attributes, and classes to conduct experiments. A comparison between the effectiveness of the developed method and that of five existing data normalization methods was carried out. It was experimentally established that the developed method increases the accuracy of the Decision Tree and Extra Trees Classifier by 1–5% in the case of performing the binary classification task and the accuracy of the Bagging, Decision Tree, and Extra Trees Classifier by 1–6% in the case of performing the multiclass classification task. Increasing the accuracy of these classifiers only by using the new data normalization method satisfies all the prerequisites for its application in practice when performing various medical data mining tasks.

Keywords: medical diagnostics; classification accuracy; preprocessing; data normalization; scalers; small data; machine learning; decision trees; binary classification; multiclass classification; precision model (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://www.mdpi.com/2227-7390/10/11/1942/pdf (application/pdf)
https://www.mdpi.com/2227-7390/10/11/1942/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:10:y:2022:i:11:p:1942-:d:832476

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().