EconPapers    
Economics at your fingertips  
 

A model selection criterion for model-based clustering of annotated gene expression data

Gallopin Mélina (), Celeux Gilles, Jaffrézic Florence and Rau Andrea
Additional contact information
Gallopin Mélina: Laboratoire de Mathématiques, UMR 8628, Université Paris-Sud, 91405, Orsay Cedex, France INRA, UMR 1313 Génétique Animale et Biologie Intégrative, 78352 Jouy-en-Josas, France
Celeux Gilles: Inria Saclay Île-de-France, Université Paris-Sud, 91405, Orsay Cedex, France
Jaffrézic Florence: INRA, UMR 1313 Génétique Animale et Biologie Intégrative, 78352 Jouy-en-Josas, France AgroParisTech, UMR 1313 Génétique Animale et Biologie Intégrative, 75231 Paris, France
Rau Andrea: INRA, UMR 1313 Génétique Animale et Biologie Intégrative, 78352 Jouy-en-Josas, France AgroParisTech, UMR 1313 Génétique Animale et Biologie Intégrative, 75231 Paris, France

Statistical Applications in Genetics and Molecular Biology, 2015, vol. 14, issue 5, 413-428

Abstract: In co-expression analyses of gene expression data, it is often of interest to interpret clusters of co-expressed genes with respect to a set of external information, such as a potentially incomplete list of functional properties for which a subset of genes may be annotated. Based on the framework of finite mixture models, we propose a model selection criterion that takes into account such external gene annotations, providing an efficient tool for selecting a relevant number of clusters and clustering model. This criterion, called the integrated completed annotated likelihood (ICAL), is defined by adding an entropy term to a penalized likelihood to measure the concordance between a clustering partition and the external annotation information. The ICAL leads to the choice of a model that is more easily interpretable with respect to the known functional gene annotations. We illustrate the interest of this model selection criterion in conjunction with Gaussian mixture models on simulated gene expression data and on real RNA-seq data.

Keywords: functional gene annotation; gene expression data; model-based clustering; model selection (search for similar items in EconPapers)
Date: 2015
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1515/sagmb-2014-0095 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:14:y:2015:i:5:p:413-428:n:5

Ordering information: This journal article can be ordered from
https://www.degruyter.com/journal/key/sagmb/html

DOI: 10.1515/sagmb-2014-0095

Access Statistics for this article

Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf

More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().

 
Page updated 2025-03-19
Handle: RePEc:bpj:sagmbi:v:14:y:2015:i:5:p:413-428:n:5