Analyzing the performance of deep learning splice prediction algorithms

Fortier, Nathan; Rudy, Gabe; Scherer, Andreas

Analyzing the performance of deep learning splice prediction algorithms

Nathan Fortier, Gabe Rudy and Andreas Scherer

PLOS ONE, 2026, vol. 21, issue 5, 1-16

Abstract: SpliceAI is the leading tool for predicting splice-altering variants, but restrictive licensing limits clinical adoption. While open-source implementations have been published with author-reported comparisons, independent benchmarking across diverse datasets is needed to establish equivalence. We compared the original SpliceAI with two open-source implementations (OpenSpliceAI and CI-SpliceAI) and a legacy ensemble baseline across six datasets: a curated set of 1,316 validated variants, 213 variants with splice-assay data, 99,601 variants from the SPiP splicing prediction study, 242 manually curated deep intronic pathogenic variants, and two ClinVar-derived datasets comprising 53,600 intronic variants and 58,064 variants spanning all genomic contexts. The deep learning models were also evaluated against an ensemble of four legacy splice-prediction tools. Across all datasets, the deep learning algorithms outperformed the legacy ensemble. All three deep learning algorithms showed similar performance on the larger datasets dominated by canonical splice site variants (balanced accuracies 0.889-0.977). On the deep intronic benchmark, the original SpliceAI achieved the highest balanced accuracy (0.940), outperforming both CI-SpliceAI (0.890) and OpenSpliceAI (0.841). Critically, optimal thresholds for deep intronic variants were an order of magnitude lower than standard recommendations, indicating that default thresholds would miss the majority of pathogenic deep intronic variants. A correlation analysis showed that CI-SpliceAI maintained balanced concordance across event types, whereas OpenSpliceAI showed stronger correlation for loss events than gain events. Both implementations showed high positional agreement with SpliceAI, with exact splice-site match rates exceeding 90% across event types. Together, these results demonstrate that both open-source reimplementations of SpliceAI successfully reproduce the predictive behavior of the original algorithm across multiple evaluation contexts, while consistently outperforming traditional splice prediction methods. However, performance diverges on deeply intronic variants, and standard score thresholds are poorly calibrated for this variant class regardless of algorithm choice.

Date: 2026
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0348885 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 48885&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0348885

DOI: 10.1371/journal.pone.0348885

Access Statistics for this article

More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().