EconPapers    
Economics at your fingertips  
 

A deep learning approach to real-time HIV outbreak detection using genetic data

Michael D Kupperman, Thomas Leitner and Ruian Ke

PLOS Computational Biology, 2022, vol. 18, issue 10, 1-20

Abstract: Pathogen genomic sequence data are increasingly made available for epidemiological monitoring. A main interest is to identify and assess the potential of infectious disease outbreaks. While popular methods to analyze sequence data often involve phylogenetic tree inference, they are vulnerable to errors from recombination and impose a high computational cost, making it difficult to obtain real-time results when the number of sequences is in or above the thousands.Here, we propose an alternative strategy to outbreak detection using genomic data based on deep learning methods developed for image classification. The key idea is to use a pairwise genetic distance matrix calculated from viral sequences as an image, and develop convolutional neutral network (CNN) models to classify areas of the images that show signatures of active outbreak, leading to identification of subsets of sequences taken from an active outbreak. We showed that our method is efficient in finding HIV-1 outbreaks with R0 ≥ 2.5, and overall a specificity exceeding 98% and sensitivity better than 92%. We validated our approach using data from HIV-1 CRF01 in Europe, containing both endemic sequences and a well-known dual outbreak in intravenous drug users. Our model accurately identified known outbreak sequences in the background of slower spreading HIV. Importantly, we detected both outbreaks early on, before they were over, implying that had this method been applied in real-time as data became available, one would have been able to intervene and possibly prevent the extent of these outbreaks. This approach is scalable to processing hundreds of thousands of sequences, making it useful for current and future real-time epidemiological investigations, including public health monitoring using large databases and especially for rapid outbreak identification.Author summary: The analysis of pathogen genomic data to analyze epidemics at scale is constrained by the computational cost associated with phylogenetic tree reconstruction. As a fast and efficient alternative, we employed convolutional neural networks to analyze evolutionary pairwise distance matrices as images to perform classifications of the current epidemiological situation of a growing public health sequence database. We used simulated data to train and test our model, and as validation we accurately mapped the start and end of two linked well-documented HIV-1 outbreaks in the backdrop of ongoing slower HIV spread. Thus, our new approach is efficient, accurate, scalable, and can analyze data in real time.

Date: 2022
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010598 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 10598&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1010598

DOI: 10.1371/journal.pcbi.1010598

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2025-05-31
Handle: RePEc:plo:pcbi00:1010598