EconPapers    
Economics at your fingertips  
 

A new correlation coefficient between categorical, ordinal and interval variables with Pearson characteristics

M. Baak, R. Koopman, H. Snoek and S. Klous

Computational Statistics & Data Analysis, 2020, vol. 152, issue C

Abstract: A prescription is presented for a new and practical correlation coefficient, ϕK, based on several refinements to Pearson’s hypothesis test of independence of two variables. The combined features of ϕK form an advantage over existing coefficients. Primarily, it works consistently between categorical, ordinal and interval variables, in essence by treating each variable as categorical, and can therefore be used to calculate correlations between variables of mixed type. Second, it captures nonlinear dependency. The strength of ϕK is similar to Pearson’s correlation coefficient, and is equivalent in case of a bivariate normal input distribution. These are useful properties when studying the correlations between variables with mixed types, where some are categorical. Two more innovations are presented: to the proper evaluation of statistical significance of correlations, and to the interpretation of variable relationships in a contingency table, in particular in case of sparse or low statistics samples and significant dependencies. Two practical applications are discussed. The presented algorithms are easy to use and available through a public Python library.11https://github.com/KaveIO/PhiK.

Keywords: Data analysis; Correlation; Contingency test; Significance; Simulation (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (15)

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0167947320301341
Full text for ScienceDirect subscribers only.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:csdana:v:152:y:2020:i:c:s0167947320301341

DOI: 10.1016/j.csda.2020.107043

Access Statistics for this article

Computational Statistics & Data Analysis is currently edited by S.P. Azen

More articles in Computational Statistics & Data Analysis from Elsevier
Bibliographic data for series maintained by Catherine Liu ().

 
Page updated 2025-03-19
Handle: RePEc:eee:csdana:v:152:y:2020:i:c:s0167947320301341