EconPapers    
Economics at your fingertips  
 

Comparing the Characteristics of Gene Expression Profiles Derived by Univariate and Multivariate Classification Methods

Zucknick Manuela, Richardson Sylvia and Stronach Euan A
Additional contact information
Zucknick Manuela: German Cancer Research Centre
Richardson Sylvia: Imperial College London
Stronach Euan A: Imperial College London

Statistical Applications in Genetics and Molecular Biology, 2008, vol. 7, issue 1, 34

Abstract: One application of gene expression arrays is to derive molecular profiles, i.e., sets of genes, which discriminate well between two classes of samples, for example between tumour types. Users are confronted with a multitude of classification methods of varying complexity that can be applied to this task. To help decide which method to use in a given situation, we compare important characteristics of a range of classification methods, including simple univariate filtering, penalised likelihood methods and the random forest.Classification accuracy is an important characteristic, but the biological interpretability of molecular profiles is also important. This implies both parsimony and stability, in the sense that profiles should not vary much when there are slight changes in the training data. We perform a random resampling study to compare these characteristics between the methods and across a range of profile sizes. We measure stability by adopting the Jaccard index to assess the similarity of resampled molecular profiles.We carry out a case study on five well-established cancer microarray data sets, for two of which we have the benefit of being able to validate the results in an independent data set. The study shows that those methods which produce parsimonious profiles generally result in better prediction accuracy than methods which don't include variable selection. For very small profile sizes, the sparse penalised likelihood methods tend to result in more stable profiles than univariate filtering while maintaining similar predictive performance.

Keywords: microarrays; molecular signature; classification; multivariate analysis; penalised likelihood (search for similar items in EconPapers)
Date: 2008
References: View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://doi.org/10.2202/1544-6115.1307 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:7:y:2008:i:1:n:7

Ordering information: This journal article can be ordered from
https://www.degruyter.com/journal/key/sagmb/html

DOI: 10.2202/1544-6115.1307

Access Statistics for this article

Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf

More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().

 
Page updated 2025-03-19
Handle: RePEc:bpj:sagmbi:v:7:y:2008:i:1:n:7