Regulatory Motif Finding by Logic Regression
Sunduz Keles,
Mark van der Laan and
Chris Vulpe
Additional contact information
Mark van der Laan: Division of Biostatistics, School of Public Health, University of California, Berkeley
Chris Vulpe: Nutritional Science & Toxicology, University of California, Berkeley
No 1145, U.C. Berkeley Division of Biostatistics Working Paper Series from Berkeley Electronic Press
Abstract:
Multiple transcription factors coordinately control transcriptional regulation of genes in eukaryotes. Although multiple computational methods consider the identification of individual transcription factor binding sites (TFBSs), very few focus on the interactions between these sites. We consider finding transcription factor binding sites and their context specific interactions using microarray gene expression data. We devise a hybrid approach called LogicMotif composed of a TFBS identification method combined with the new regression methodology logic regression of Ruczinski et al. (2003). LogicMotif has two steps: First potential binding sites are identified from transcription control regions of genes of interest. Various available methods can be used in this first step when the genes of interest can be divided into groups such as up and down regulated. For this step, we also develop a simple univariate regression and extension method MFURE to extract candidate TFBSs from a large number of genes in the availability of microarray gene expression data. MFURE provides an alternative method for this step when partitioning of the genes into disjoint groups is not preferred. This first step aims to identify individual sites within gene groups of interest or sites that are correlated with the gene expression outcome. In the second step, logic regression is used to build a predictive model of outcome of interest (either gene expression or up and down regulation) using these potential sites. This two-fold approach creates a rich diverse set of potential binding sites in the first step and builds regression or classification models in the second step using logic regression that is particularly good at identifying complex interactions.LogicMotif is applied to two publicly available data sets. A genome-wide gene expression data set of Saccharomyces cerevisiae is used for validation. The regression models obtained are interpretable and the biological implications are in agreement with the known resuts. This analysis suggests that LogicMotif provides biologically more reasonable regression models than previous analysis of this data set with standard linear regression methods. Another data set of Saccharomyces cerevisiae illustrates the use of LogicMotif in classification questions by building a model that discriminates between up and down regulated genes in iron copper deficiency. LogicMotif identified an inductive and two repressor motifs in this data set. The inductive motif matches the binding site of the transcription factor Aft1p that has a key role in regulation of the uptake process. One of the novel repressor sites is highly present in transcription control regions of FeS genes. This site could represent a TFBS for an unknown transcription factor involved in repression of genes encoding FeS proteins in iron deficiency. We established the stability of the method to the type of outcome variable by using both continuous and binary outcome variables for this data set. Our results indicate that logic regression used in combination with cluster/group operating binding site identification methods or with our proposed method MFURE is a powerful and flexible alternative to linear regression based motif finding methods.
Keywords: Microarray gene expression; transcription factors; regulatory motifs; logic regression; cross-validation; yeastgulatory (search for similar items in EconPapers)
Date: 2004-07-11
Note: oai:bepress.com:ucbbiostat-1145
References: View complete reference list from CitEc
Citations: View citations in EconPapers (6)
Downloads: (external link)
http://www.bepress.com/cgi/viewcontent.cgi?article=1145&context=ucbbiostat (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bep:ucbbio:1145
Access Statistics for this paper
More papers in U.C. Berkeley Division of Biostatistics Working Paper Series from Berkeley Electronic Press
Bibliographic data for series maintained by Christopher F. Baum (baum@bc.edu).