Computational models for the classification of antibody specificity using heavy chain features
Jia Lin,
Jiaqi Chen,
Linxuan Wan,
Weinan He,
Yuxin Zhu,
Mu Qiao,
Fancun Meng,
Di Lin,
Yan Che and
Zicheng Cao
PLOS ONE, 2026, vol. 21, issue 5, 1-16
Abstract:
Background: Antibodies play a critical role in immune defense, with their antigen specificity primarily governed by the unique sequences of their heavy chains, rendering them invaluable tools in research and diagnostics. High-throughput sequencing technologies have facilitated comprehensive profiling of the immune repertoire, generating vast antibody sequence datasets that necessitate advanced analytical methods. Methods: In this study, we utilized curated antibody sequences from NCBI databases to develop computational classification models for categorizing antibodies into predefined antigen classes. We extracted multifaceted features from the heavy chain sequences, encompassing physicochemical properties, structural composition, sequence order, and evolutionary information. These features were input into machine-learning classifiers to predict antigen specificity across five classes of antibodies: anti-dengue virus, anti-influenza virus, anti-tetanus bacillus, anti-SARS-CoV-2, and anti-Mycobacterium tuberculosis. Results: Five tree-based machine-learning models were employed, with CatBoost achieving the highest accuracy of 0.7713. To further enhance predictive performance, we developed a stacking model leveraging multiple algorithms, resulting in an improved accuracy of 0.7803. Additionally, a Feature-Based Transformer deep-learning architecture was implemented, yielding an accuracy of 0.7399 and an F1-score of 0.6761. To elucidate the key determinants of antibody-antigen interactions, we applied the SHAP framework to assess feature importance. Among the top 30 contributing features, those representing sequence order and evolutionary information predominated, with amino acids such as cysteine (C), isoleucine (I), histidine (H), and phenylalanine (F) exhibiting notable SHAP values. Notably, cysteine (Cys) emerged as the most influential feature, underscoring its critical role in antibody structure and function. Specific antibodies contributed variably to these key features; for instance, the anti-tuberculosis antibody accounted for approximately 11% of a sequence order feature associated with alanine (A), while the anti-SARS-CoV-2 antibody contributed about 9.26% to a feature associated with isoleucine (I). Conclusions: Our study demonstrates the efficacy of machine-learning and deep-learning approaches in classifying antibodies into specific antigen categories, providing sequence-based insights into features associated with antibody specificity. These findings have significant implications for the mechanistic understanding, isolation, and development of potential therapeutic antibodies.
Date: 2026
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0349143 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 49143&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0349143
DOI: 10.1371/journal.pone.0349143
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().