EconPapers    
Economics at your fingertips  
 

Systematically benchmarking peptide-MHC binding predictors: From synthetic to naturally processed epitopes

Weilong Zhao and Xinwei Sher

PLOS Computational Biology, 2018, vol. 14, issue 11, 1-28

Abstract: A number of machine learning-based predictors have been developed for identifying immunogenic T-cell epitopes based on major histocompatibility complex (MHC) class I and II binding affinities. Rationally selecting the most appropriate tool has been complicated by the evolving training data and machine learning methods. Despite the recent advances made in generating high-quality MHC-eluted, naturally processed ligandome, the reliability of new predictors on these epitopes has yet to be evaluated. This study reports the latest benchmarking on an extensive set of MHC-binding predictors by using newly available, untested data of both synthetic and naturally processed epitopes. 32 human leukocyte antigen (HLA) class I and 24 HLA class II alleles are included in the blind test set. Artificial neural network (ANN)-based approaches demonstrated better performance than regression-based machine learning and structural modeling. Among the 18 predictors benchmarked, ANN-based mhcflurry and nn_align perform the best for MHC class I 9-mer and class II 15-mer predictions, respectively, on binding/non-binding classification (Area Under Curves = 0.911). NetMHCpan4 also demonstrated comparable predictive power. Our customization of mhcflurry to a pan-HLA predictor has achieved similar accuracy to NetMHCpan. The overall accuracy of these methods are comparable between 9-mer and 10-mer testing data. However, the top methods deliver low correlations between the predicted versus the experimental affinities for strong MHC binders. When used on naturally processed MHC-ligands, tools that have been trained on elution data (NetMHCpan4 and MixMHCpred) shows better accuracy than pure binding affinity predictor. The variability of false prediction rate is considerable among HLA types and datasets. Finally, structure-based predictor of Rosetta FlexPepDock is less optimal compared to the machine learning approaches. With our benchmarking of MHC-binding and MHC-elution predictors using a comprehensive metrics, a unbiased view for establishing best practice of T-cell epitope predictions is presented, facilitating future development of methods in immunogenomics.Author summary: Computationally predicting antigen peptide sequences that elicit T-cell immune response has broad and significant impact on vaccine design. The most widely accepted approach is to rely on machine learning classifier, trained on large-scale major-histocompatibility complex (MHC)-binding peptide dataset. Because of the constant development of machine learning algorithms and expanding training data, providing comprehensive benchmarking of existing algorithms on blind testing dataset is important for recognizing the pros and cons of different algorithms and providing guidelines on specific applications. Here we present a study of such benchmarking by characterizing on a wide array of accuracy metrics, highlighting the best-in-class algorithms as well as their limitations. The rising concept that “naturally presented” antigen epitopes are more likely to generate effective T-cell immune response has led us to also consider the accuracy of these machine learning algorithms on predicting naturally presented peptides. We demonstrate that recent advance in incorporating high-quality naturally presented peptide data from mass spectrometry experiments has improved the accuracy. Our benchmarking of machine learning predictors for MHC-binding and MHC-naturally presented antigen peptides contributes to establishing best practice of computational T-cell epitope analysis, which also has implication in tumor neoantigen-based cancer vaccine discovery.

Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006457 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 06457&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1006457

DOI: 10.1371/journal.pcbi.1006457

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2025-03-19
Handle: RePEc:plo:pcbi00:1006457