GrassSV – hybrid method to detect structural variants in high throughput DNA-seq data
Dominik Witczak,
Krzysztof Sychla,
Julia Wysocka,
Artur Laskowski,
Wojciech Frohmberg,
Marta Glowacka,
Alicja Dzik,
Piotr Lukasiak,
Jacek Blazewicz and
Aleksandra Swiercz
PLOS Computational Biology, 2026, vol. 22, issue 6, 1-14
Abstract:
Genetic diversity is crucial for populations to adapt and survive in dynamic environments. This diversity arises from genetic mutations, which manifest in the genome as structural variants (SVs). Several types of SVs exist, but not all are equally easy to detect. Current SV detection tools tend to specialize in certain SV types or require the use of multiple tools to obtain a comprehensive variant profile, which increases computational cost and complexity. While some methods excel at identifying breakpoints, they often struggle with accurately classifying variant types, and their precision depends strongly on data quality and sequencing technology. At present, the majority of available genomic data originates from high-quality short reads, which remain the most affordable sequencing technology. In this manuscript, we introduce GrassSV, a novel and computationally efficient method that employs a hybrid pattern-matching approach to detect all major classes of structural variants using short-read sequencing data. GrassSV integrates depth-of-coverage analysis with contig-based pattern recognition to ensure both sensitivity and precision while minimizing false positives and runtime. Its robustness was demonstrated on the human Genome in a Bottle dataset, as well as on synthetic data derived from the yeast genome, where it achieved high accuracy across all SV types at a lower computational cost compared to existing methods. This makes GrassSV a practical alternative to multi-tool pipelines typically required for comprehensive SV detection. GrassSV is available at https://github.com/Domomod/GrassSV under GPL-3.0 license and the benchmark at: https://github.com/Domomod/GrassBenchmark.Author summary: Structural variants (SVs) are large genomic alterations that can profoundly influence gene function, regulation, and phenotype. Despite their biological importance, accurately detecting SVs from sequencing data remains a major computational challenge. Existing tools are often optimized for specific types of variants or rely on multiple algorithms to achieve full coverage, which increases computational cost and complexity. In this study, we present GrassSV, a hybrid approach for structural variant detection using short-read sequencing data. Our method combines coverage-based analysis with pattern recognition from assembled contigs, enabling comprehensive identification of deletions, insertions, inversions, duplications, and translocations within a single pipeline. We developed GrassSV to provide researchers with a practical, accurate, and efficient tool for large-scale genome analyses. We evaluated GrassSV on both synthetic and real datasets, showing that it detects all major SV types with high precision while reducing false positives and runtime compared to existing methods. By balancing accuracy and efficiency, GrassSV offers a cost-effective solution for genomic research and supports ongoing efforts to understand genetic variability in human and model organism populations.
Date: 2026
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1014406 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 14406&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1014406
DOI: 10.1371/journal.pcbi.1014406
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().