Predicting B cell receptor substitution profiles using public repertoire data
Amrit Dhar,
Kristian Davidsen,
Frederick A Matsen Iv and
Vladimir N Minin
PLOS Computational Biology, 2018, vol. 14, issue 10, 1-24
Abstract:
B cells develop high affinity receptors during the course of affinity maturation, a cyclic process of mutation and selection. At the end of affinity maturation, a number of cells sharing the same ancestor (i.e. in the same “clonal family”) are released from the germinal center; their amino acid frequency profile reflects the allowed and disallowed substitutions at each position. These clonal-family-specific frequency profiles, called “substitution profiles”, are useful for studying the course of affinity maturation as well as for antibody engineering purposes. However, most often only a single sequence is recovered from each clonal family in a sequencing experiment, making it impossible to construct a clonal-family-specific substitution profile. Given the public release of many high-quality large B cell receptor datasets, one may ask whether it is possible to use such data in a prediction model for clonal-family-specific substitution profiles. In this paper, we present the method “Substitution Profiles Using Related Families” (SPURF), a penalized tensor regression framework that integrates information from a rich assemblage of datasets to predict the clonal-family-specific substitution profile for any single input sequence. Using this framework, we show that substitution profiles from similar clonal families can be leveraged together with simulated substitution profiles and germline gene sequence information to improve prediction. We fit this model on a large public dataset and validate the robustness of our approach on two external datasets. Furthermore, we provide a command-line tool in an open-source software package (https://github.com/krdav/SPURF) implementing these ideas and providing easy prediction using our pre-fit models.Author summary: Antibody engineering can be greatly informed by knowledge about the underlying affinity maturation process. As such this can be probed by sequencing, but unfortunately, in practice often only one member of the clonal family is sequenced, making it difficult to determine a set of possible amino acid mutations that would retain the original antibody antigen binding affinity. We overcome this data sparsity by developing a statistical learning approach that leverages vast information about amino acid preferences available in public immune system repertoire data. We use a penalized regression approach to devise a flexible statistical model that integrates multiple sources of information into a coherent prediction framework and validate our prediction algorithm using subsampling and held out data.
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006388 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 06388&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1006388
DOI: 10.1371/journal.pcbi.1006388
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().