Identifying T Cell Receptors from High-Throughput Sequencing: Dealing with Promiscuity in TCRα and TCRβ Pairing
Edward S Lee,
Paul G Thomas,
Jeff E Mold and
Andrew J Yates
PLOS Computational Biology, 2017, vol. 13, issue 1, 1-25
Abstract:
Characterisation of the T cell receptors (TCR) involved in immune responses is important for the design of vaccines and immunotherapies for cancer and autoimmune disease. The specificity of the interaction between the TCR heterodimer and its peptide-MHC ligand derives largely from the juxtaposed hypervariable CDR3 regions on the TCRα and TCRβ chains, and obtaining the paired sequences of these regions is a standard for functionally defining the TCR. A brute force approach to identifying the TCRs in a population of T cells is to use high-throughput single-cell sequencing, but currently this process remains costly and risks missing small clones. Alternatively, CDR3α and CDR3β sequences can be associated using their frequency of co-occurrence in independent samples, but this approach can be confounded by the sharing of CDR3α and CDR3β across clones, commonly observed within epitope-specific T cell populations. The accurate, exhaustive, and economical recovery of TCR sequences from such populations therefore remains a challenging problem. Here we describe an algorithm for performing frequency-based pairing (alphabetr) that accommodates CDR3α- and CDR3β-sharing, cells expressing two TCRα chains, and multiple forms of sequencing error. The algorithm also yields accurate estimates of clonal frequencies.Author Summary: Our repertoires of T cell receptors (TCR) give our immune system the ability to recognise a huge diversity of foreign and self antigens, and identifying the TCRs involved in infectious disease, cancer, and autoimmune disease is important for designing vaccines and immunotherapies. The majority of T cells express a TCR made up of two chains, the TCRα and TCRβ, and high-throughput sequencing of samples of T cells results in the loss of this pairing information. One can identify TCRαβ clones using single-cell sequencing, but this is costly and typically probes only part of the diversity of T cell populations. Statistical approaches are potentially more powerful by sequencing the TCRα and TCRβ in multiple samples of T cells and pairing them using their frequency of co-occurrence. However, T cells involved in immune responses frequently share TCRα and TCRβ chains with other responding cells. This promiscuity, combined with a high prevalence of T cells with two TCRα chains and sequencing errors, presents significant challenges to frequency-based pairing methods. Here we present a new algorithm that addresses these challenges and also provides accurate estimates of the abundances of T cell clonotypes, allowing us to build a more complete picture of T cell responses.
Date: 2017
References: View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005313 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 05313&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1005313
DOI: 10.1371/journal.pcbi.1005313
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().