EconPapers    
Economics at your fingertips  
 

VIRify: An integrated detection, annotation and taxonomic classification pipeline using virus-specific protein profile hidden Markov models

Guillermo Rangel-Pineros, Alexandre Almeida, Martin Beracochea, Ekaterina Sakharova, Manja Marz, Alejandro Reyes Muñoz, Martin Hölzer and Robert D Finn

PLOS Computational Biology, 2023, vol. 19, issue 8, 1-28

Abstract: The study of viral communities has revealed the enormous diversity and impact these biological entities have on various ecosystems. These observations have sparked widespread interest in developing computational strategies that support the comprehensive characterisation of viral communities based on sequencing data. Here we introduce VIRify, a new computational pipeline designed to provide a user-friendly and accurate functional and taxonomic characterisation of viral communities. VIRify identifies viral contigs and prophages from metagenomic assemblies and annotates them using a collection of viral profile hidden Markov models (HMMs). These include our manually-curated profile HMMs, which serve as specific taxonomic markers for a wide range of prokaryotic and eukaryotic viral taxa and are thus used to reliably classify viral contigs. We tested VIRify on assemblies from two microbial mock communities, a large metagenomics study, and a collection of publicly available viral genomic sequences from the human gut. The results showed that VIRify could identify sequences from both prokaryotic and eukaryotic viruses, and provided taxonomic classifications from the genus to the family rank with an average accuracy of 86.6%. In addition, VIRify allowed the detection and taxonomic classification of a range of prokaryotic and eukaryotic viruses present in 243 marine metagenomic assemblies. Finally, the use of VIRify led to a large expansion in the number of taxonomically classified human gut viral sequences and the improvement of outdated and shallow taxonomic classifications. Overall, we demonstrate that VIRify is a novel and powerful resource that offers an enhanced capability to detect a broad range of viral contigs and taxonomically classify them.Author summary: Viruses are the most abundant biological entities on our planet. Some are relevant pathogens for public health or agriculture. Still, many also play ecological roles that are critical for maintaining ecosystems. Most viruses are yet to be cultured, so their identification and characterisation depend solely on the analysis of DNA or RNA obtained from the environment. Unlike cellular organisms, viruses also lack a universal genetic marker that allows taxonomic profiling of an environmental viral community. We have manually curated a set of specific viral protein models that serve as taxonomic markers for a comprehensive range of viral taxa. Using these protein models, we developed VIRify, a computational pipeline for the detection, annotation, and taxonomic classification of viral sequences obtained from environmental DNA or RNA. Our new pipeline was efficient in detecting and classifying sequences of viruses targeting bacteria or eukaryotic organisms in mock microbial communities, samples from the world’s oceans, and a previously assembled collection of human gut viruses. VIRify is user-friendly, requires minimal interaction with the command line, and was developed with portability in mind. VIRify can enhance the exploration of viral diversity in nature and support the detection of pathogenic viruses with pandemic potential.

Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011422 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 11422&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1011422

DOI: 10.1371/journal.pcbi.1011422

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2025-05-31
Handle: RePEc:plo:pcbi00:1011422