PON-Del predictor for sequence retaining protein deletions
Haoyang Zhang,
Muhammad Kabir and
Mauno Vihinen
PLOS Computational Biology, 2026, vol. 22, issue 2, 1-18
Abstract:
Protein deletions are frequent among both disease-causing and tolerated variants. Several mechanisms at the DNA, RNA and protein levels can lead to deletions. Many deletions are misclassified in the literature and databases, especially when the mRNA is degraded by the cellular quality-control mechanism. We developed a novel predictor for sequence retaining protein deletions, i.e., variants that do not alter the sequence downstream of the deletion site. We collected an extensive dataset of verified protein deletions, each described by a comprehensive set of context, content, position, and gene-based features. We evaluated both statistical and deep learning algorithms and selected a gradient boosting–based approach to develop the PON-Del predictor for short, 1–10 amino acid, sequence-retaining deletions. Variants are typically classified into two categories: either pathogenic or benign. However, there is always a third class of variants: variants of uncertain significance (VUSs), which have been ignored by all previous methods. PON-Del is the first deletion interpretation method that includes VUSs. It provides two outputs, binary and three-state prediction with VUSs. The performance of PON-Del was superior to that of previous methods. The tool is freely available at https://structure.bmc.lu.se/pon_del/.Author summary: Protein deletions are frequent among both disease-causing and tolerated variants, and are caused by several mechanisms at the DNA, RNA and protein levels. The reliable prediction of the effects of deletions is challenging. We developed a predictor for sequence retaining protein deletions, variants that do not alter the sequence beyond the deletion site. We collected an extensive dataset of verified protein deletions, and a comprehensive set of features to describe them. We evaluated seven algorithms and selected a gradient boosting–based approach to develop the PON-Del predictor for short, 1–10 amino acid, sequence-retaining deletions. Variants have typically been classified as pathogenic or benign. This practice misses the third category: variants of uncertain significance (VUSs). PON-Del is the first deletion interpretation method that includes VUSs. The performance of PON-Del was superior to that of previous methods. The tool is freely available at https://structure.bmc.lu.se/pon_del/.
Date: 2026
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1014020 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 14020&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1014020
DOI: 10.1371/journal.pcbi.1014020
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().