EconPapers    
Economics at your fingertips  
 

A comparison of two dissimilarity functions for mixed-type predictor variables in the $$\delta $$ δ -machine

Beibei Yuan (), Willem Heiser and Mark Rooij
Additional contact information
Beibei Yuan: Leiden University
Willem Heiser: Leiden University
Mark Rooij: Leiden University

Advances in Data Analysis and Classification, 2022, vol. 16, issue 4, No 4, 875-907

Abstract: Abstract The $$\delta $$ δ -machine is a statistical learning tool for classification based on dissimilarities or distances between profiles of the observations to profiles of a representation set, which was proposed by Yuan et al. (J Claasif 36(3): 442–470, 2019). So far, the $$\delta $$ δ -machine was restricted to continuous predictor variables only. In this article, we extend the $$\delta $$ δ -machine to handle continuous, ordinal, nominal, and binary predictor variables. We utilized a tailored dissimilarity function for mixed type variables which was defined by Gower. This measure has properties of a Manhattan distance. We develop, in a similar vein, a Euclidean dissimilarity function for mixed type variables. In simulation studies we compare the performance of the two dissimilarity functions and we compare the predictive performance of the $$\delta $$ δ -machine to logistic regression models. We generated data according to two population distributions where the type of predictor variables, the distribution of categorical variables, and the number of predictor variables was varied. The performance of the $$\delta $$ δ -machine using the two dissimilarity functions and different types of representation set was investigated. The simulation studies showed that the adjusted Euclidean dissimilarity function performed better than the adjusted Gower dissimilarity function; that the $$\delta $$ δ -machine outperformed logistic regression; and that for constructing the representation set, K-medoids clustering achieved fewer active exemplars than the one using K-means clustering while maintaining the accuracy. We also applied the $$\delta $$ δ -machine to an empirical example, discussed its interpretation in detail, and compared the classification performance with five other classification methods. The results showed that the $$\delta $$ δ -machine has a good balance between accuracy and interpretability.

Keywords: Dissimilarity; Nonlinear classification; Mixed-type data; Monte Carlo; 62H30 (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s11634-021-00463-6 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:advdac:v:16:y:2022:i:4:d:10.1007_s11634-021-00463-6

Ordering information: This journal article can be ordered from
http://www.springer. ... ds/journal/11634/PS2

DOI: 10.1007/s11634-021-00463-6

Access Statistics for this article

Advances in Data Analysis and Classification is currently edited by H.-H. Bock, W. Gaul, A. Okada, M. Vichi and C. Weihs

More articles in Advances in Data Analysis and Classification from Springer, German Classification Society - Gesellschaft für Klassifikation (GfKl), Japanese Classification Society (JCS), Classification and Data Analysis Group of the Italian Statistical Society (CLADAG), International Federation of Classification Societies (IFCS)
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:advdac:v:16:y:2022:i:4:d:10.1007_s11634-021-00463-6