Enhancing efficiency of protein language models with minimal wet-lab data through few-shot learning
Ziyi Zhou,
Liang Zhang,
Yuanxi Yu,
Banghao Wu,
Mingchen Li,
Liang Hong () and
Pan Tan ()
Additional contact information
Ziyi Zhou: Shanghai Jiao Tong University
Liang Zhang: Shanghai Jiao Tong University
Yuanxi Yu: Shanghai Jiao Tong University
Banghao Wu: Shanghai Jiao Tong University
Mingchen Li: Shanghai Artificial Intelligence Laboratory
Liang Hong: Shanghai Jiao Tong University
Pan Tan: Shanghai Jiao Tong University
Nature Communications, 2024, vol. 15, issue 1, 1-13
Abstract:
Abstract Accurately modeling the protein fitness landscapes holds great importance for protein engineering. Pre-trained protein language models have achieved state-of-the-art performance in predicting protein fitness without wet-lab experimental data, but their accuracy and interpretability remain limited. On the other hand, traditional supervised deep learning models require abundant labeled training examples for performance improvements, posing a practical barrier. In this work, we introduce FSFP, a training strategy that can effectively optimize protein language models under extreme data scarcity for fitness prediction. By combining meta-transfer learning, learning to rank, and parameter-efficient fine-tuning, FSFP can significantly boost the performance of various protein language models using merely tens of labeled single-site mutants from the target protein. In silico benchmarks across 87 deep mutational scanning datasets demonstrate FSFP’s superiority over both unsupervised and supervised baselines. Furthermore, we successfully apply FSFP to engineer the Phi29 DNA polymerase through wet-lab experiments, achieving a 25% increase in the positive rate. These results underscore the potential of our approach in aiding AI-guided protein engineering.
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.nature.com/articles/s41467-024-49798-6 Abstract (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-49798-6
Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/
DOI: 10.1038/s41467-024-49798-6
Access Statistics for this article
Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie
More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().