Automated Discovery of Functional Generality of Human Gene Expression Programs
Georg K Gerber,
Robin D Dowell,
Tommi S Jaakkola and
David K Gifford
PLOS Computational Biology, 2007, vol. 3, issue 8, 1-15
Abstract:
An important research problem in computational biology is the identification of expression programs, sets of co-expressed genes orchestrating normal or pathological processes, and the characterization of the functional breadth of these programs. The use of human expression data compendia for discovery of such programs presents several challenges including cellular inhomogeneity within samples, genetic and environmental variation across samples, uncertainty in the numbers of programs and sample populations, and temporal behavior. We developed GeneProgram, a new unsupervised computational framework based on Hierarchical Dirichlet Processes that addresses each of the above challenges. GeneProgram uses expression data to simultaneously organize tissues into groups and genes into overlapping programs with consistent temporal behavior, to produce maps of expression programs, which are sorted by generality scores that exploit the automatically learned groupings. Using synthetic and real gene expression data, we showed that GeneProgram outperformed several popular expression analysis methods. We applied GeneProgram to a compendium of 62 short time-series gene expression datasets exploring the responses of human cells to infectious agents and immune-modulating molecules. GeneProgram produced a map of 104 expression programs, a substantial number of which were significantly enriched for genes involved in key signaling pathways and/or bound by NF-κB transcription factors in genome-wide experiments. Further, GeneProgram discovered expression programs that appear to implicate surprising signaling pathways or receptor types in the response to infection, including Wnt signaling and neurotransmitter receptors. We believe the discovered map of expression programs involved in the response to infection will be useful for guiding future biological experiments; genes from programs with low generality scores might serve as new drug targets that exhibit minimal “cross-talk,” and genes from high generality programs may maintain common physiological responses that go awry in disease states. Further, our method is multipurpose, and can be applied readily to novel compendia of biological data.: In recent years, DNA microarrays have been used to produce large compendia of human gene expression data, which are promising resources for discovery of expression programs, sets of co-expressed genes orchestrating important physiological or pathological processes. However, these compendia present particular challenges, including cellular inhomogeneity within samples, genetic and environmental variation across samples, uncertainty in the numbers of programs and sample populations, and temporal behavior. To address these challenges, we developed GeneProgram, a state-of-the-art statistical framework that automatically generates interpretable maps of expression programs from microarray data. GeneProgram accomplishes this by simultaneously organizing tissues into groups and genes into overlapping programs with consistent temporal behavior, and sorting programs by a generality score. Such maps may be valuable for guiding future biological experiments; genes from programs with low generality scores might serve as new drug targets that exhibit minimal “cross-talk,” and genes from high generality programs may maintain common physiological responses that go awry in disease states. Using synthetic and real data, we showed that GeneProgram outperformed several popular expression analysis methods. Further, on a compendium of time-series gene expression data measuring the responses of human cells to infectious agents, GeneProgram discovered programs that implicate surprising signaling pathways and receptor types.
Date: 2007
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.0030148 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 30148&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:0030148
DOI: 10.1371/journal.pcbi.0030148
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().