EconPapers    
Economics at your fingertips  
 

Genome-wide identification and characterization of DNA enhancers with a stacked multivariate fusion framework

Yansong Wang, Zilong Hou, Yuning Yang, Ka-chun Wong and Xiangtao Li

PLOS Computational Biology, 2022, vol. 18, issue 12, 1-33

Abstract: Enhancers are short non-coding DNA sequences outside of the target promoter regions that can be bound by specific proteins to increase a gene’s transcriptional activity, which has a crucial role in the spatiotemporal and quantitative regulation of gene expression. However, enhancers do not have a specific sequence motifs or structures, and their scattered distribution in the genome makes the identification of enhancers from human cell lines particularly challenging. Here we present a novel, stacked multivariate fusion framework called SMFM, which enables a comprehensive identification and analysis of enhancers from regulatory DNA sequences as well as their interpretation. Specifically, to characterize the hierarchical relationships of enhancer sequences, multi-source biological information and dynamic semantic information are fused to represent regulatory DNA enhancer sequences. Then, we implement a deep learning–based sequence network to learn the feature representation of the enhancer sequences comprehensively and to extract the implicit relationships in the dynamic semantic information. Ultimately, an ensemble machine learning classifier is trained based on the refined multi-source features and dynamic implicit relations obtained from the deep learning-based sequence network. Benchmarking experiments demonstrated that SMFM significantly outperforms other existing methods using several evaluation metrics. In addition, an independent test set was used to validate the generalization performance of SMFM by comparing it to other state-of-the-art enhancer identification methods. Moreover, we performed motif analysis based on the contribution scores of different bases of enhancer sequences to the final identification results. Besides, we conducted interpretability analysis of the identified enhancer sequences based on attention weights of EnhancerBERT, a fine-tuned BERT model that provides new insights into exploring the gene semantic information likely to underlie the discovered enhancers in an interpretable manner. Finally, in a human placenta study with 4,562 active distal gene regulatory enhancers, SMFM successfully exposed tissue-related placental development and the differential mechanism, demonstrating the generalizability and stability of our proposed framework.Author summary: Numerous evidence suggest that genes regulated by enhancers located in non-coding DNA regions are involved in a myriad of biological activities. To fully understand the regulatory role and mechanisms of enhancers on genes, the localization and identification of enhancers is essential. Several experimental biological methods are capable of localizing enhancers, however, these methods are resource intensive. To address this limitation, we developed a stacked multivariate fusion framework, called SMFM to identify and analyze enhancers with high accuracy and efficiency based on enhancer-specific dynamic semantic information and multi-source biological properties. The performance of the model is verified by experiments comparing different feature algorithms and classification algorithms. The superiority of our method is demonstrated by comparing it with several state-of-the-art algorithms. In addition, several analytical experiments demonstrate that SMFM is capable of recognizing enhancers in different tissues and detecting motifs in enhancers. To the best of our knowledge, this is the first computational approach that uses enhancer-specific dynamic semantic information to identify enhancers from regulatory DNA sequences and interpret them. It is expected that the SMFM model will effectively target enhancers and provide valid candidates for further biological experiments.

Date: 2022
References: Add references at CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010779 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 10779&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1010779

DOI: 10.1371/journal.pcbi.1010779

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2025-05-03
Handle: RePEc:plo:pcbi00:1010779