Supervised Distance Matrices
Pollard Katherine S. and
J. van der Laan Mark
Additional contact information
Pollard Katherine S.: University of California, San Francisco
J. van der Laan Mark: University of California, Berkeley
Statistical Applications in Genetics and Molecular Biology, 2008, vol. 7, issue 1, 30
Abstract:
We introduce a novel statistical concept, called a supervised distance matrix, which quantifies pairwise similarity between variables in terms of their association with an outcome. Supervised distance matrices are derived in two stages. First, the observed data is transformed based on particular working models for association. Examples of transformations include residuals or influence curves from regression models. In the second stage, a choice of distance measure is used to compute all pairwise distances between variables in the transformed data. We present consistent estimators of the resulting distance matrix, including an inverse probability of censoring weighted estimator for use with right-censored outcomes. Supervised distance matrices can be used with standard (unsupervised) clustering algorithms to identify groups of similarly predictive variables and to discover subpopulations of related samples. This approach is illustrated using simulations and an analysis of gene expression data with a censored survival outcome. The proposed methods are widely applicable in genomics and other fields where high-dimensional data is collected on each subject.
Keywords: distance; clustering; regression; survival; censoring (search for similar items in EconPapers)
Date: 2008
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://doi.org/10.2202/1544-6115.1404 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:7:y:2008:i:1:n:33
Ordering information: This journal article can be ordered from
https://www.degruyter.com/journal/key/sagmb/html
DOI: 10.2202/1544-6115.1404
Access Statistics for this article
Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf
More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().