EconPapers    
Economics at your fingertips  
 

Large-scale paired chain BCR analysis reveals antibody clonal family inference bias and enhances resolution with machine learning

Hao Wang, Kaixuan Wang, Qihang Xu, Linru Cai, Chuanxiang Huang, Linlin Chen, Yunliang Zang, Xihao Hu and Jian Zhang

PLOS Computational Biology, 2026, vol. 22, issue 3, 1-22

Abstract: A fundamental question in immunology is how the adaptive immune system encodes antigen specificity while maintaining repertoire diversity. B cell receptor (BCR) or antibody clonal families, defined by groups of B cells descending from a common ancestor, are key to deciphering this encoding. Although paired heavy and light chains jointly determine antibody specificity, most repertoire analyses have historically relied on heavy-chain-only data due to the loss of native pairing information in bulk BCR sequencing. This reliance introduces potential biases in computational clonal cluster inference, which may complicate efforts to resolve disease-associated immune signatures. Here, we leverage large-scale paired-chain BCR sequencing data to demonstrate that heavy-chain-based clustering may misrepresent true clonal architecture, and identified two major artifacts: chain-mixed clusters, in which similar heavy chains are paired with distinct light chains, and naive-like pseudo-clonal clusters, which are detected in an individual’s naive B cell repertoire and exhibit highly similar heavy and light chains without reflecting true clonal expansion. To address these limitations, we present fastBCR-p, an optimized framework that integrates light-chain-informed subclustering, with public sequence aware refinement to improve clonal family inference. By resolving both technical artifacts and biological convergence, fastBCR-p improves the chain concordance and overall clustering quality of clonal inference in real-world datasets. This enables more accurate tracking of immune dynamics in health and disease and facilitates the identification of clinically relevant antibody lineages.Author summary: Our immune system protects us by producing a vast and diverse collection of antibodies, each designed to recognize a specific target. These antibodies are made by B cells, which expand and evolve in groups known as clonal families. Accurately identifying these clonal families from sequencing data is essential for understanding immune responses during infection, vaccination, and disease. Most existing computational methods infer B-cell clonal families using information from only one part of the antibody, the heavy chain. This limitation largely reflects the fact that traditional sequencing technologies often lose information about how heavy and light chains are naturally paired. However, both chains are required to define antibody specificity. Using large-scale datasets that preserve native heavy–light chain pairing, we show that heavy-chain-only approaches can introduce systematic errors. These include incorrectly grouping together unrelated B cells and falsely identifying naive B cells as expanded clones. To address these limitations, we developed fastBCR-p, which integrates light-chain information and accounts for shared (“public”) antibody sequences. By correcting both technical artifacts and biological convergence, fastBCR-p enables more accurate clonal family inference and improves the analysis of immune repertoire dynamics and facilitating the identification of clinically relevant antibody lineages.

Date: 2026
References: Add references at CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1014077 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 14077&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1014077

DOI: 10.1371/journal.pcbi.1014077

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2026-03-15
Handle: RePEc:plo:pcbi00:1014077