Protein sequence modelling with Bayesian flow networks
Timothy Atkinson,
Thomas D. Barrett (),
Scott Cameron,
Bora Guloglu,
Matthew Greenig,
Charlie B. Tan,
Louis Robinson,
Alex Graves,
Liviu Copoiu and
Alexandre Laterre
Additional contact information
Timothy Atkinson: InstaDeep
Thomas D. Barrett: InstaDeep
Scott Cameron: InstaDeep
Bora Guloglu: InstaDeep
Matthew Greenig: InstaDeep
Charlie B. Tan: InstaDeep
Louis Robinson: InstaDeep
Alex Graves: InstaDeep
Liviu Copoiu: InstaDeep
Alexandre Laterre: InstaDeep
Nature Communications, 2025, vol. 16, issue 1, 1-14
Abstract:
Abstract Exploring the vast and largely uncharted territory of amino acid sequences is crucial for understanding complex protein functions and the engineering of novel therapeutic proteins. Whilst generative machine learning has advanced protein sequence modelling, no existing approach is proficient in both unconditional and conditional generation. In this work, we propose that Bayesian Flow Networks (BFNs), a recently introduced framework for generative modelling, can address these challenges. We present ProtBFN, a 650M parameter model trained on protein sequences curated from UniProtKB, which generates natural-like, diverse, structurally coherent, and novel protein sequences, significantly outperforming leading autoregressive and discrete diffusion models. Further, we fine-tune ProtBFN on heavy chains from the Observed Antibody Space to obtain an antibody-specific model, AbBFN, which we use to evaluate zero-shot conditional generation capabilities. AbBFN is found to be competitive with or better than antibody-specific BERT-style models when applied to predicting individual framework or complimentary determining regions.
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.nature.com/articles/s41467-025-58250-2 Abstract (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-58250-2
Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/
DOI: 10.1038/s41467-025-58250-2
Access Statistics for this article
Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie
More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().