EconPapers    
Economics at your fingertips  
 

Machine learning identifies signatures of host adaptation in the bacterial pathogen Salmonella enterica

Nicole E Wheeler, Paul P Gardner and Lars Barquist

PLOS Genetics, 2018, vol. 14, issue 5, 1-20

Abstract: Emerging pathogens are a major threat to public health, however understanding how pathogens adapt to new niches remains a challenge. New methods are urgently required to provide functional insights into pathogens from the massive genomic data sets now being generated from routine pathogen surveillance for epidemiological purposes. Here, we measure the burden of atypical mutations in protein coding genes across independently evolved Salmonella enterica lineages, and use these as input to train a random forest classifier to identify strains associated with extraintestinal disease. Members of the species fall along a continuum, from pathovars which cause gastrointestinal infection and low mortality, associated with a broad host-range, to those that cause invasive infection and high mortality, associated with a narrowed host range. Our random forest classifier learned to perfectly discriminate long-established gastrointestinal and invasive serovars of Salmonella. Additionally, it was able to discriminate recently emerged Salmonella Enteritidis and Typhimurium lineages associated with invasive disease in immunocompromised populations in sub-Saharan Africa, and within-host adaptation to invasive infection. We dissect the architecture of the model to identify the genes that were most informative of phenotype, revealing a common theme of degradation of metabolic pathways in extraintestinal lineages. This approach accurately identifies patterns of gene degradation and diversifying selection specific to invasive serovars that have been captured by more labour-intensive investigations, but can be readily scaled to larger analyses.Author summary: Researchers are now collecting a wealth of genomic data from bacterial pathogens, and this will continue to grow with the introduction of routine sequencing for disease surveillance. However, our ability to use this data to predict how changes in genome sequence lead to differences in disease is limited. Here, we have used machine learning to detect an enrichment in functionally significant mutations in genes associated with a shift in pathogenic niche. This approach captures convergence in functional outcomes that does not necessarily result in a convergence in sequence, facilitating the inclusion of rare variants of large effect in an analysis, and allowing for complex interactions between genes. We apply this approach to Salmonella, showing that we can detect changes associated with disease phenotype in emerging lineages associated with the HIV epidemic. This approach should be applicable to other bacterial species with lineages independently adapting to similar niches. We provide open-source implementations of both the predictive model, and the workflow used to build it.

Date: 2018
References: View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1007333 (text/html)
https://journals.plos.org/plosgenetics/article/fil ... 07333&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pgen00:1007333

DOI: 10.1371/journal.pgen.1007333

Access Statistics for this article

More articles in PLOS Genetics from Public Library of Science
Bibliographic data for series maintained by plosgenetics ().

 
Page updated 2025-03-19
Handle: RePEc:plo:pgen00:1007333