EconPapers    
Economics at your fingertips  
 

MoCETSE: A mixture-of-convolutional experts and transformer-based model for predicting Gram-negative bacterial secreted effectors

Hua Shi, Yihang Lin, Dachen Liu and Quan Zou

PLOS Computational Biology, 2026, vol. 22, issue 3, 1-23

Abstract: Identifying effector proteins of secretion systems in Gram-negative bacteria is crucial for deciphering their pathogenic mechanisms and guiding the development of antimicrobial strategies. Extracting evolutionary and sequence features using pre-trained protein language models (PLMs) has emerged as an effective approach to improve the performance of effector protein prediction. However, the high-dimensional features generated by PLMs contain extensive general biological information, making it difficult to focus on core features when applied directly to effector protein tasks, which in turn limits prediction performance. In this study, we propose MoCETSE, a deep learning model for predicting effector proteins in Gram-negative bacteria. Specifically, MoCETSE first extracts contextual representations of sequences using the pre-trained protein language model ESM-1b. Subsequently, it refines key functional features via a target preprocessing network to construct more expressive sequence representations. Finally, integrated with a transformer module incorporating relative positional encoding, MoCETSE explicitly models the relative spatial relationships between residues, enabling highly accurate prediction of secreted effector proteins. MoCETSE exhibits excellent and robust performance in both five-fold cross-validation and independent testing. Benchmark results demonstrate that it maintains strong competitiveness compared to existing binary and multi-class predictors. Additionally, the model can effectively perform genome-wide effector protein prediction, showing outstanding specificity and reliability. MoCETSE provides an efficient and robust computational framework for the accurate identification of bacterial effector substrates and offers key biological insights.Author summary: Secreted effector proteins are a class of key virulence factors in Gram-negative bacteria. After being injected into host cells, they interfere with normal cellular functions, leading to the development of diseases. Accurate identification of these virulence proteins is crucial for understanding bacterial pathogenic mechanisms and developing therapeutic strategies. However, existing methods suffer from issues such as feature redundancy and insufficient capture of long-range dependency signals. Here, we developed a novel computational framework called MoCETSE that enables end-to-end intelligent prediction of effector proteins directly from raw protein sequence information. The model leverages a pre-trained protein language model to extract deep biological information from raw sequences; a target preprocessing network then refines the extracted information to focus on features most relevant to effector protein identification. During the learning of secretion signal features, we introduced relative positional encoding to effectively capture associations between distant positions in the sequence. In cross-category prediction, MoCETSE outperformed tools such as DeepSecE. Furthermore, we provide interpretable biological mechanisms supporting the model, revealing which key sequence motifs and functional regions play core roles in distinguishing different types of effector proteins.

Date: 2026
References: Add references at CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1013397 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 13397&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1013397

DOI: 10.1371/journal.pcbi.1013397

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2026-03-15
Handle: RePEc:plo:pcbi00:1013397