EconPapers    
Economics at your fingertips  
 

Decision tree approaches for zero-inflated count data

Seong-Keon Lee and Seohoon Jin

Journal of Applied Statistics, 2006, vol. 33, issue 8, 853-865

Abstract: There have been many methodologies developed about zero-inflated data in the field of statistics. However, there is little literature in the data mining fields, even though zero-inflated data could be easily found in real application fields. In fact, there is no decision tree method that is suitable for zero-inflated responses. To analyze continuous target variable with decision trees as one of data mining techniques, we use F-statistics (CHAID) or variance reduction (CART) criteria to find the best split. But these methods are only appropriate to a continuous target variable. If the target variable is rare events or zero-inflated count data, the above criteria could not give a good result because of its attributes. In this paper, we will propose a decision tree for zero-inflated count data, using a maximum of zero-inflated Poisson likelihood as the split criterion. In addition, using well-known data sets we will compare the performance of the split criteria. In the case when the analyst is interested in lower value groups (e.g. no defect areas, customers who do not claim), the suggested ZIP tree would be more efficient.

Keywords: Data mining; decision tree; homogeneity; maximum likelihood; zero-inflated Poisson (ZIP) (search for similar items in EconPapers)
Date: 2006
References: View complete reference list from CitEc
Citations: View citations in EconPapers (4)

Downloads: (external link)
http://www.tandfonline.com/doi/abs/10.1080/02664760600743613 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:taf:japsta:v:33:y:2006:i:8:p:853-865

Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/CJAS20

DOI: 10.1080/02664760600743613

Access Statistics for this article

Journal of Applied Statistics is currently edited by Robert Aykroyd

More articles in Journal of Applied Statistics from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().

 
Page updated 2025-03-20
Handle: RePEc:taf:japsta:v:33:y:2006:i:8:p:853-865