EconPapers    
Economics at your fingertips  
 

Simple Bayesian binary framework for discovering significant genes and classifying cancer diagnosis

Tae Young Yang

Computational Statistics & Data Analysis, 2009, vol. 53, issue 5, 1743-1754

Abstract: Given a microarray dataset consisting of two classes, type I and type II, the proposed coherent binary framework sequentially combines a gene-rank algorithm and a classifier. Genes that are expressed at a consistently high level in one type and at a consistently low level in the other type are of much interest. The wider the gap between the expression levels, the more significant the gene is as a discriminator. A new distance metric is used to measure the gap and is obtained using Bayesian nonparametric approaches involving Dirichlet process priors. Significant genes are ranked separately based on the pattern (the genes are over-expressed in type I and under-expressed in type II) or the pattern (the genes are under-expressed in type I and over-expressed in type II). An out-of-sample cross-validation approach is suggested for use in deciding how many significant genes are necessary for the classifier. The classifier uses each selected top-ranked gene to calculate a classification score when a test sample is presented. The sample is then classified as having the type with the larger score. Empirical studies using two public datasets show that top-ranked genes in each pattern clearly distinguish the existing pattern, and the classifier uses a few significant genes to classify the types in the test samples correctly. The framework is a simple, easy alternative to more complex models in terms of its accuracy and robustness.

Date: 2009
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0167-9473(08)00213-2
Full text for ScienceDirect subscribers only.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:csdana:v:53:y:2009:i:5:p:1743-1754

Access Statistics for this article

Computational Statistics & Data Analysis is currently edited by S.P. Azen

More articles in Computational Statistics & Data Analysis from Elsevier
Bibliographic data for series maintained by Catherine Liu ().

 
Page updated 2025-03-19
Handle: RePEc:eee:csdana:v:53:y:2009:i:5:p:1743-1754