Optimal Feature Selection for Decision Trees Induction Using a Genetic Algorithm Wrapper - A Model Approach
Prokopis K. Theodoridis and
Dimitris C. Gkikas ()
Additional contact information
Prokopis K. Theodoridis: University of Patras
Dimitris C. Gkikas: University of Patras
A chapter in Strategic Innovative Marketing and Tourism, 2020, pp 583-591 from Springer
Abstract:
Abstract The aim of this paper is to describe an approach to a sophisticated model of optimised subsets of data classification. This effort refers to a seemingly parallel processing of two algorithms, in order to successfully classify features through optimization processing, using a wrapping method in order to decrease overfitting and maintain accuracy. A wrapping method measures how useful the features are through the classifier’s performance optimisation. In cases where big datasets are classified the risk of overfitting to occur is high. Thus, instead of classifying big datasets, a “smarter” approach is used by classifying subsets of data, also called chromosomes, using a genetic algorithm. The genetic algorithm is used to find the best combinations of chromosomes from a series of combinations called generations. The genetic algorithm will produce a big number of chromosomes of certain number of attributes, also called genes, that will be classified from the decision tree and they will get a fitness number. This fitness number refers to classification accuracy that each chromosome got from the classification process. Only the strongest chromosomes will pass on the next generation. This method reduces the size of genes classified, eliminating at the same time the risk of overfitting. At the end, the fittest chromosomes or sets of genes or subsets of attributes will be represented. This method helps on faster and more accurate decision making. Applications of this wrapper can be used in digital marketing campaigns metrics, analytics metrics, website ranking factors, content curation, keyword research, consumer/visitor behavior analysis and other areas of marketing and business interest.
Keywords: Decision trees; Genetic algorithm; Data classification; Data optimisation; Overfitting; Classification accuracy; Chromosomes; Genes (search for similar items in EconPapers)
Date: 2020
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:prbchp:978-3-030-36126-6_65
Ordering information: This item can be ordered from
http://www.springer.com/9783030361266
DOI: 10.1007/978-3-030-36126-6_65
Access Statistics for this chapter
More chapters in Springer Proceedings in Business and Economics from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().