A Novel Approach to Decision-Making on Diagnosing Oncological Diseases Using Machine Learning Classifiers Based on Datasets Combining Known and/or New Generated Features of a Different Nature
Liliya A. Demidova
Additional contact information
Liliya A. Demidova: Institute of Information Technologies, Federal State Budget Educational Institution of Higher Education, MIREA—Russian Technological University, 78, Vernadsky Avenue, 119454 Moscow, Russia
Mathematics, 2023, vol. 11, issue 4, 1-39
Abstract:
This paper deals with the problem of diagnosing oncological diseases based on blood protein markers. The goal of the study is to develop a novel approach in decision-making on diagnosing oncological diseases based on blood protein markers by generating datasets that include various combinations of features: both known features corresponding to blood protein markers and new features generated with the help of mathematical tools, particularly with the involvement of the non-linear dimensionality reduction algorithm UMAP, formulas for various entropies and fractal dimensions. These datasets were used to develop a group of multiclass kNN and SVM classifiers using oversampling algorithms to solve the problem of class imbalance in the dataset, which is typical for medical diagnostics problems. The results of the experimental studies confirmed the feasibility of using the UMAP algorithm and approximation entropy, as well as Katz and Higuchi fractal dimensions to generate new features based on blood protein markers. Various combinations of these features can be used to expand the set of features from the original dataset in order to improve the quality of the received classification solutions for diagnosing oncological diseases. The best kNN and SVM classifiers were developed based on the original dataset augmented respectively with a feature based on the approximation entropy and features based on the UMAP algorithm and the approximation entropy. At the same time, the average values of the metric MacroF 1 - score used to assess the quality of classifiers during cross-validation increased by 16.138% and 4.219%, respectively, compared to the average values of this metric in the case when the original dataset was used in the development of classifiers of the same name.
Keywords: decision-making; oncological disease; kNN classifier; SVM classifier; dataset; features; UMAP algorithm; entropy; fractal dimension (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/11/4/792/pdf (application/pdf)
https://www.mdpi.com/2227-7390/11/4/792/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:11:y:2023:i:4:p:792-:d:1057729
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().