Mining Massive Amounts of Genomic Data: A Semiparametric Topic Modeling Approach
Ethan X. Fang,
Min-Dian Li,
Michael I. Jordan and
Han Liu
Journal of the American Statistical Association, 2017, vol. 112, issue 519, 921-932
Abstract:
Characterizing the functional relevance of transcription factors (TFs) in different biological contexts is pivotal in systems biology. Given the massive amount of genomic data, computational identification of TFs is emerging as a useful approach to bridge functional genomics with disease risk loci. In this article, we use large-scale gene expression and chromatin immunoprecipitation (ChIP) data corpuses to conduct high-throughput TF-biological context association analysis. This work makes two contributions: (i) From a methodological perspective, we propose a unified topic modeling framework for exploring and analyzing large and complex genomic datasets. Under this framework, we develop new statistical optimization algorithms and semiparametric theoretical analysis, which are also applicable to a variety of large-scale data analyses. (ii) From an experimental perspective, our method generates an informative list of tumor-related TFs and their possible effected tumor types. Our data-driven analysis of 38 TFs in 68 tumor biological contexts identifies functional signatures of epigenetic regulators, such as SUZ12 and SET-DB1, and nuclear receptors, in many tumor types. In particular, the TF signature of SUZ12 is present in a broad range of tumor types, many of which have not been reported before. In summary, our work established a robust method to identify the association between TFs and biological contexts. Given the limited amount of genome-wide binding profiles of TFs and the massive number of expression profiles, our work provides a useful tool to deconvolute the gene regulatory network for tumors and other biological contexts. Supplementary materials for this article are available online.
Date: 2017
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
http://hdl.handle.net/10.1080/01621459.2016.1256812 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:taf:jnlasa:v:112:y:2017:i:519:p:921-932
Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/UASA20
DOI: 10.1080/01621459.2016.1256812
Access Statistics for this article
Journal of the American Statistical Association is currently edited by Xuming He, Jun Liu, Joseph Ibrahim and Alyson Wilson
More articles in Journal of the American Statistical Association from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().