EconPapers    
Economics at your fingertips  
 

Contrastive-learning of language embedding and biological features for cross modality encoding and effector prediction

Yue Peng, Junze Wu, Yi Sun, Yuanxing Zhang, Qiyao Wang and Shuai Shao ()
Additional contact information
Yue Peng: East China University of Science and Technology
Junze Wu: East China University of Science and Technology
Yi Sun: East China University of Science and Technology
Yuanxing Zhang: Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai)
Qiyao Wang: East China University of Science and Technology
Shuai Shao: East China University of Science and Technology

Nature Communications, 2025, vol. 16, issue 1, 1-20

Abstract: Abstract Identifying and characterizing virulence proteins secreted by Gram-negative bacteria are fundamental for deciphering microbial pathogenicity as well as aiding the development of therapeutic strategies. Effector predictors utilizing pre-trained protein language models (PLMs) have shown sound performance by leveraging extensive evolutionary and sequential protein features. However, the accuracy and sensitivity of effector prediction remain challenging. Here, we introduce a model named Contrastive-learning of Language Embedding and Biological Features (CLEF) leveraging contrastive learning to integrate PLM representations with supplementary biological features. Biologically information is captured in learned contextualized embeddings to yield meaningful representations. With cross-modality biological features, CLEF outperforms state-of-the-art (SOTA) models in predicting type III, type IV, and type VI secreted effectors (T3SEs/T4SEs/T6SEs) in enteric pathogens. All experimentally verified effectors in Enterohemorrhagic Escherichia coli and 41 of 43 experimentally verified T3SEs of Salmonella Typhimurium are recognized. Moreover, 12 predicted T3SEs and 11 predicted T6SEs are validated by extensive experiments in Edwardsiella piscicida. Furthermore, integrating omics data via CLEF framework enhances protein representations to illustrate effector-effector interactions and determine in vivo colonization-essential genes. Collectively, CLEF provides a blueprint to bridge the gap between in silico PLM’s capacity and experimental biological information to fulfill complicated tasks.

Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.nature.com/articles/s41467-025-56526-1 Abstract (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-56526-1

Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/

DOI: 10.1038/s41467-025-56526-1

Access Statistics for this article

Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie

More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-22
Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-56526-1