An approach of improving decision tree classifier using condensed informative data

Panhalkar, Archana R.; Doye, Dharmpal D.

An approach of improving decision tree classifier using condensed informative data

Archana R. Panhalkar () and Dharmpal D. Doye
Additional contact information
Archana R. Panhalkar: Shri Guru Gobind Singhji Institute of Engineering and Technology
Dharmpal D. Doye: Shri Guru Gobind Singhji Institute of Engineering and Technology

DECISION: Official Journal of the Indian Institute of Management Calcutta, 2020, vol. 47, issue 4, No 8, 445 pages

Abstract: Abstract The advancement of new technologies in today’s era produces a vast amount of data. To store, analyze and mine knowledge from huge data requires large space as well as better execution speed. To train classifiers using a large amount of data requires more time and space. To avoid wastage of time and space, there is a need to mine significant information from a huge collection of data. Decision tree is one of the promising classifiers which mine knowledge from huge data. This paper aims to reduce the data to construct efficient decision tree classifier. This paper presents a method which finds informative data to improve the performance of decision tree classifier. Two clustering-based methods are proposed for dimensionality reduction and utilizing knowledge from outliers. These condensed data are applied to the decision tree for high prediction accuracy. The uniqueness of the first method is that it finds the representative instances from clusters that utilize knowledge of its neighboring data. The second method uses supervised clustering which finds the number of cluster representatives for the reduction of data. With an increase in the prediction accuracy of a tree, these methods decrease the size, building time and space required for decision tree classifiers. These novel methods are united into a single supervised and unsupervised Decision Tree based on Cluster Analysis Pre-processing (DTCAP) which hunts the informative instances from a small, medium and large dataset. The experiments are conducted on a standard UCI dataset of different sizes. It illustrates that the method with its simplicity performs a reduction of data up to 50%. It produces a qualitative dataset which enhances the performance of the decision tree classifier.

Keywords: Data mining; Decision tree classifier; K-means clustering; C4.5; Instance reduction (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s40622-020-00265-3 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:decisn:v:47:y:2020:i:4:d:10.1007_s40622-020-00265-3

Ordering information: This journal article can be ordered from
http://www.springer.com/journal/40622

DOI: 10.1007/s40622-020-00265-3

Access Statistics for this article

DECISION: Official Journal of the Indian Institute of Management Calcutta is currently edited by Rajesh Babu

More articles in DECISION: Official Journal of the Indian Institute of Management Calcutta from Springer, Indian Institute of Management Calcutta
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().