EconPapers    
Economics at your fingertips  
 

Utility and Limitations of Using Gene Expression Data to Identify Functional Associations

Sahra Uygun, Cheng Peng, Melissa D Lehti-Shiu, Robert L Last and Shin-Han Shiu

PLOS Computational Biology, 2016, vol. 12, issue 12, 1-27

Abstract: Gene co-expression has been widely used to hypothesize gene function through guilt-by association. However, it is not clear to what degree co-expression is informative, whether it can be applied to genes involved in different biological processes, and how the type of dataset impacts inferences about gene functions. Here our goal is to assess the utility and limitations of using co-expression as a criterion to recover functional associations between genes. By determining the percentage of gene pairs in a metabolic pathway with significant expression correlation, we found that many genes in the same pathway do not have similar transcript profiles and the choice of dataset, annotation quality, gene function, expression similarity measure, and clustering approach significantly impacts the ability to recover functional associations between genes using Arabidopsis thaliana as an example. Some datasets are more informative in capturing coordinated expression profiles and larger data sets are not always better. In addition, to recover the maximum number of known pathways and identify candidate genes with similar functions, it is important to explore rather exhaustively multiple dataset combinations, similarity measures, clustering algorithms and parameters. Finally, we validated the biological relevance of co-expression cluster memberships with an independent phenomics dataset and found that genes that consistently cluster with leucine degradation genes tend to have similar leucine levels in mutants. This study provides a framework for obtaining gene functional associations by maximizing the information that can be obtained from gene expression datasets.Author Summary: There remain genes with no known function even in the most well studied, model species. One common way to hypothesize gene function is based on the assumption that genes with similar expression profiles tend to have similar functions. However, using datasets and biological pathway information from the model plant Arabidopsis thaliana as an example, we discovered that, although genes in the same pathways are functionally related, genes in only a subset of the pathways have highly similar expression patterns. In addition, our ability to hypothesize gene functions based on expression is significantly impacted by how the dataset is processed and combined as well as the methodology used to identify genes with similar expression. Therefore, multiple datasets and methods should be tested to maximize the functional information that we can get based on similarity in gene expression.

Date: 2016
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005244 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 05244&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1005244

DOI: 10.1371/journal.pcbi.1005244

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2025-03-19
Handle: RePEc:plo:pcbi00:1005244