NR-2L: A Two-Level Predictor for Identifying Nuclear Receptor Subfamilies Based on Sequence-Derived Features
Pu Wang,
Xuan Xiao and
Kuo-Chen Chou
PLOS ONE, 2011, vol. 6, issue 8, 1-9
Abstract:
Nuclear receptors (NRs) are one of the most abundant classes of transcriptional regulators in animals. They regulate diverse functions, such as homeostasis, reproduction, development and metabolism. Therefore, NRs are a very important target for drug development. Nuclear receptors form a superfamily of phylogenetically related proteins and have been subdivided into different subfamilies due to their domain diversity. In this study, a two-level predictor, called NR-2L, was developed that can be used to identify a query protein as a nuclear receptor or not based on its sequence information alone; if it is, the prediction will be automatically continued to further identify it among the following seven subfamilies: (1) thyroid hormone like (NR1), (2) HNF4-like (NR2), (3) estrogen like, (4) nerve growth factor IB-like (NR4), (5) fushi tarazu-F1 like (NR5), (6) germ cell nuclear factor like (NR6), and (7) knirps like (NR0). The identification was made by the Fuzzy K nearest neighbor (FK-NN) classifier based on the pseudo amino acid composition formed by incorporating various physicochemical and statistical features derived from the protein sequences, such as amino acid composition, dipeptide composition, complexity factor, and low-frequency Fourier spectrum components. As a demonstration, it was shown through some benchmark datasets derived from the NucleaRDB and UniProt with low redundancy that the overall success rates achieved by the jackknife test were about 93% and 89% in the first and second level, respectively. The high success rates indicate that the novel two-level predictor can be a useful vehicle for identifying NRs and their subfamilies. As a user-friendly web server, NR-2L is freely accessible at either http://icpr.jci.edu.cn/bioinfo/NR2L or http://www.jci-bioinfo.cn/NR2L. Each job submitted to NR-2L can contain up to 500 query protein sequences and be finished in less than 2 minutes. The less the number of query proteins is, the shorter the time will usually be. All the program codes for NR-2L are available for non-commercial purpose upon request.
Date: 2011
References: View complete reference list from CitEc
Citations: View citations in EconPapers (3)
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0023505 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 23505&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0023505
DOI: 10.1371/journal.pone.0023505
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().