Petabase-scale sequence alignment catalyses viral discovery
Robert C. Edgar,
Brie Taylor,
Victor Lin,
Tomer Altman,
Pierre Barbera,
Dmitry Meleshko,
Dan Lohr,
Gherman Novakovsky,
Benjamin Buchfink,
Basem Al-Shayeb,
Jillian F. Banfield,
Marcos Peña,
Anton Korobeynikov,
Rayan Chikhi and
Artem Babaian ()
Additional contact information
Robert C. Edgar: Independent researcher
Brie Taylor: Independent researcher
Victor Lin: Independent researcher
Tomer Altman: Altman Analytics
Pierre Barbera: Heidelberg Institute for Theoretical Studies
Dmitry Meleshko: St Petersburg State University
Dan Lohr: Unaffiliated
Gherman Novakovsky: University of British Columbia
Benjamin Buchfink: Max Planck Institute for Biology
Basem Al-Shayeb: University of California, Berkeley
Jillian F. Banfield: University of California, Berkeley
Marcos Peña: Universidad Politécnica de Valencia–CSIC
Anton Korobeynikov: St Petersburg State University
Rayan Chikhi: G5 Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur
Artem Babaian: Independent researcher
Nature, 2022, vol. 602, issue 7895, 142-147
Abstract:
Abstract Public databases contain a planetary collection of nucleic acid sequences, but their systematic exploration has been inhibited by a lack of efficient methods for searching this corpus, which (at the time of writing) exceeds 20 petabases and is growing exponentially1. Here we developed a cloud computing infrastructure, Serratus, to enable ultra-high-throughput sequence alignment at the petabase scale. We searched 5.7 million biologically diverse samples (10.2 petabases) for the hallmark gene RNA-dependent RNA polymerase and identified well over 105 novel RNA viruses, thereby expanding the number of known species by roughly an order of magnitude. We characterized novel viruses related to coronaviruses, hepatitis delta virus and huge phages, respectively, and analysed their environmental reservoirs. To catalyse the ongoing revolution of viral discovery, we established a free and comprehensive database of these data and tools. Expanding the known sequence diversity of viruses can reveal the evolutionary origins of emerging pathogens and improve pathogen surveillance for the anticipation and mitigation of future pandemics.
Date: 2022
References: Add references at CitEc
Citations: View citations in EconPapers (6)
Downloads: (external link)
https://www.nature.com/articles/s41586-021-04332-2 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:nature:v:602:y:2022:i:7895:d:10.1038_s41586-021-04332-2
Ordering information: This journal article can be ordered from
https://www.nature.com/
DOI: 10.1038/s41586-021-04332-2
Access Statistics for this article
Nature is currently edited by Magdalena Skipper
More articles in Nature from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().