A Comparison of Reliability Coefficients for Ordinal Rating Scales

Raadt, Alexandra; Warrens, Matthijs J.; Bosker, Roel J.; Kiers, Henk A. L.

A Comparison of Reliability Coefficients for Ordinal Rating Scales

Alexandra Raadt (), Matthijs J. Warrens (), Roel J. Bosker () and Henk A. L. Kiers ()
Additional contact information
Alexandra Raadt: University of Groningen
Matthijs J. Warrens: University of Groningen
Roel J. Bosker: University of Groningen
Henk A. L. Kiers: University of Groningen

Journal of Classification, 2021, vol. 38, issue 3, No 6, 519-543

Abstract: Abstract Kappa coefficients are commonly used for quantifying reliability on a categorical scale, whereas correlation coefficients are commonly applied to assess reliability on an interval scale. Both types of coefficients can be used to assess the reliability of ordinal rating scales. In this study, we compare seven reliability coefficients for ordinal rating scales: the kappa coefficients included are Cohen’s kappa, linearly weighted kappa, and quadratically weighted kappa; the correlation coefficients included are intraclass correlation ICC(3,1), Pearson’s correlation, Spearman’s rho, and Kendall’s tau-b. The primary goal is to provide a thorough understanding of these coefficients such that the applied researcher can make a sensible choice for ordinal rating scales. A second aim is to find out whether the choice of the coefficient matters. We studied to what extent we reach the same conclusions about inter-rater reliability with different coefficients, and to what extent the coefficients measure agreement in a similar way, using analytic methods, and simulated and empirical data. Using analytical methods, it is shown that differences between quadratic kappa and the Pearson and intraclass correlations increase if agreement becomes larger. Differences between the three coefficients are generally small if differences between rater means and variances are small. Furthermore, using simulated and empirical data, it is shown that differences between all reliability coefficients tend to increase if agreement between the raters increases. Moreover, for the data in this study, the same conclusion about inter-rater reliability was reached in virtually all cases with the four correlation coefficients. In addition, using quadratically weighted kappa, we reached a similar conclusion as with any correlation coefficient a great number of times. Hence, for the data in this study, it does not really matter which of these five coefficients is used. Moreover, the four correlation coefficients and quadratically weighted kappa tend to measure agreement in a similar way: their values are very highly correlated for the data in this study.

Keywords: Inter-rater reliability; Cohen’s kappa; Linearly weighted kappa; Quadratically weighted kappa; Intraclass correlation; Pearson’s correlation; Spearman’s rho; Kendall’s tau-b (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://link.springer.com/10.1007/s00357-021-09386-5 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:jclass:v:38:y:2021:i:3:d:10.1007_s00357-021-09386-5

Ordering information: This journal article can be ordered from
http://www.springer. ... hods/journal/357/PS2

DOI: 10.1007/s00357-021-09386-5

Access Statistics for this article

Journal of Classification is currently edited by Douglas Steinley

More articles in Journal of Classification from Springer, The Classification Society
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().