A mixed integer programming-based global optimization framework for analyzing gene expression data

Felici, Giovanni; Tripathi, Kumar Parijat; Evangelista, Daniela; Guarracino, Mario Rosario

A mixed integer programming-based global optimization framework for analyzing gene expression data

Giovanni Felici (), Kumar Parijat Tripathi (), Daniela Evangelista and Mario Rosario Guarracino
Additional contact information
Giovanni Felici: IASI-CNR
Kumar Parijat Tripathi: ICAR-CNR
Daniela Evangelista: ICAR-CNR
Mario Rosario Guarracino: ICAR-CNR

Journal of Global Optimization, 2017, vol. 69, issue 3, No 10, 727-744

Abstract: Abstract The analysis of high throughput gene expression patients/controls experiments is based on the determination of differentially expressed genes according to standard statistical tests. A typical bioinformatics approach to this problem is composed of two separate steps: first, a subset of genes with altered expression level is identified; then the pathways which are statistically enriched by those genes are selected, assuming they play a relevant role for the biological condition under study. Often, the set of selected pathways contains elements that are not related to the condition. This is due to the fact that the statistical significance is not sufficient for biological relevance. To overcome these problems, we propose a method based on a large mixed integer program that implements a new feature selection model to simultaneously identify the genes whose over- and under-expressions, combined together, discriminate different cancer subtypes, as well as the pathways that are enriched by these genes. The innovation in this model is the solutions are driven towards the enrichment of pathways. That may indeed introduce a bias in the search; such a bias is counter-balanced by a wide exploration of the solution space, varying the involved parameters in their feasible region, and then using a global optimization approach. The conjoint analysis of the pool of solutions obtained by this exploration should indeed provide a robust final set of genes and pathways, overcoming the potential drawbacks of relying solely on statistical significance. Experimental results on transcriptomes for different types of cancer from the Cancer Genome Atlas are presented. The method is able to identify crisp relations between the considered subtypes of cancer and few selected pathways, eventually validated by the biological analysis.

Keywords: Gene expression; Pathways; Statistical significance; Global optimization; MIP; Feature selection (search for similar items in EconPapers)
Date: 2017
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s10898-017-0530-0 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:jglopt:v:69:y:2017:i:3:d:10.1007_s10898-017-0530-0

Ordering information: This journal article can be ordered from
http://www.springer. ... search/journal/10898

DOI: 10.1007/s10898-017-0530-0

Access Statistics for this article

Journal of Global Optimization is currently edited by Sergiy Butenko

More articles in Journal of Global Optimization from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().