On the scalability of ordered multi-class ROC analysis
Willem Waegeman,
Bernard De Baets and
Luc Boullart
Computational Statistics & Data Analysis, 2008, vol. 52, issue 7, 3371-3388
Abstract:
Receiver operating characteristics (ROC) analysis provides a way to select possibly optimal models for discriminating two kinds of objects without the need of specifying the cost or class distribution. It is nowadays established as a standard analysis tool in different domains, including medical decision making, pattern recognition and machine learning. Recently, an extension to the ordered multi-class case has been proposed, in which the concept of a ROC curve is generalized to an r-dimensional surface for r ordered categories, and the volume under this ROC surface (VUS) measures the overall power of a model to classify objects of the various categories. However, the computation of this criterion as well as the U-statistics estimators of its variance and covariance for two models is believed to be complex. New algorithms to compute VUS and its (co)variance estimator are presented. In particular, the volume under the ROC surface can be found very efficiently with a simple dynamic program dominated by a single sorting operation on the data set. For the variance and covariance, the respective estimators are reformulated as a series of recurrent functions over layered data graphs and subsequently these functions are rapidly evaluated with a dynamic program. Simulation experiments confirm that the presented algorithms scale well with respect to the size of the data set and the number of categories. For example, the volume under the ROC surface could be rapidly computed on very large data sets of more than 500 000 instances, while a naive implementation spent much more time on data sets of size less than 1000.
Date: 2008
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0167-9473(07)00452-5
Full text for ScienceDirect subscribers only.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:csdana:v:52:y:2008:i:7:p:3371-3388
Access Statistics for this article
Computational Statistics & Data Analysis is currently edited by S.P. Azen
More articles in Computational Statistics & Data Analysis from Elsevier
Bibliographic data for series maintained by Catherine Liu ().