VIRify: An integrated detection, annotation and taxonomic classification pipeline using virus-specific protein profile hidden Markov models
Guillermo Rangel-Pineros,
Alexandre Almeida,
Martin Beracochea,
Ekaterina Sakharova,
Manja Marz,
Alejandro Reyes Muñoz,
Martin Hölzer and
Robert D Finn
PLOS Computational Biology, 2023, vol. 19, issue 8, 1-28
Abstract:
The study of viral communities has revealed the enormous diversity and impact these biological entities have on various ecosystems. These observations have sparked widespread interest in developing computational strategies that support the comprehensive characterisation of viral communities based on sequencing data. Here we introduce VIRify, a new computational pipeline designed to provide a user-friendly and accurate functional and taxonomic characterisation of viral communities. VIRify identifies viral contigs and prophages from metagenomic assemblies and annotates them using a collection of viral profile hidden Markov models (HMMs). These include our manually-curated profile HMMs, which serve as specific taxonomic markers for a wide range of prokaryotic and eukaryotic viral taxa and are thus used to reliably classify viral contigs. We tested VIRify on assemblies from two microbial mock communities, a large metagenomics study, and a collection of publicly available viral genomic sequences from the human gut. The results showed that VIRify could identify sequences from both prokaryotic and eukaryotic viruses, and provided taxonomic classifications from the genus to the family rank with an average accuracy of 86.6%. In addition, VIRify allowed the detection and taxonomic classification of a range of prokaryotic and eukaryotic viruses present in 243 marine metagenomic assemblies. Finally, the use of VIRify led to a large expansion in the number of taxonomically classified human gut viral sequences and the improvement of outdated and shallow taxonomic classifications. Overall, we demonstrate that VIRify is a novel and powerful resource that offers an enhanced capability to detect a broad range of viral contigs and taxonomically classify them.Author summary: Viruses are the most abundant biological entities on our planet. Some are relevant pathogens for public health or agriculture. Still, many also play ecological roles that are critical for maintaining ecosystems. Most viruses are yet to be cultured, so their identification and characterisation depend solely on the analysis of DNA or RNA obtained from the environment. Unlike cellular organisms, viruses also lack a universal genetic marker that allows taxonomic profiling of an environmental viral community. We have manually curated a set of specific viral protein models that serve as taxonomic markers for a comprehensive range of viral taxa. Using these protein models, we developed VIRify, a computational pipeline for the detection, annotation, and taxonomic classification of viral sequences obtained from environmental DNA or RNA. Our new pipeline was efficient in detecting and classifying sequences of viruses targeting bacteria or eukaryotic organisms in mock microbial communities, samples from the world’s oceans, and a previously assembled collection of human gut viruses. VIRify is user-friendly, requires minimal interaction with the command line, and was developed with portability in mind. VIRify can enhance the exploration of viral diversity in nature and support the detection of pathogenic viruses with pandemic potential.
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011422 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 11422&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1011422
DOI: 10.1371/journal.pcbi.1011422
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().