Prediction of antibiotic resistance in Escherichia coli from large-scale pan-genome data
Danesh Moradigaravand,
Martin Palm,
Anne Farewell,
Ville Mustonen,
Jonas Warringer and
Leopold Parts
PLOS Computational Biology, 2018, vol. 14, issue 12, 1-17
Abstract:
The emergence of microbial antibiotic resistance is a global health threat. In clinical settings, the key to controlling spread of resistant strains is accurate and rapid detection. As traditional culture-based methods are time consuming, genetic approaches have recently been developed for this task. The detection of antibiotic resistance is typically made by measuring a few known determinants previously identified from genome sequencing, and thus requires the prior knowledge of its biological mechanisms. To overcome this limitation, we employed machine learning models to predict resistance to 11 compounds across four classes of antibiotics from existing and novel whole genome sequences of 1936 E. coli strains. We considered a range of methods, and examined population structure, isolation year, gene content, and polymorphism information as predictors. Gradient boosted decision trees consistently outperformed alternative models with an average accuracy of 0.91 on held-out data (range 0.81–0.97). While the best models most frequently employed gene content, an average accuracy score of 0.79 could be obtained using population structure information alone. Single nucleotide variation data were less useful, and significantly improved prediction only for two antibiotics, including ciprofloxacin. These results demonstrate that antibiotic resistance in E. coli can be accurately predicted from whole genome sequences without a priori knowledge of mechanisms, and that both genomic and epidemiological data can be informative. This paves way to integrating machine learning approaches into diagnostic tools in the clinic.Author summary: One of the major health threats of 21st century is emergence of antibiotic resistance. To manage its human health and economic impact, efforts are made to develop novel diagnostic tools that rapidly detect resistant strains in clinical settings. In our study, we employed a range of powerful machine learning tools to predict antibiotic resistance from whole genome sequencing data for E. coli. We used the presence or absence of genes, population structure and isolation year of isolates as predictors, and could attain average precision of 0.92 and recall of 0.83, without prior knowledge about the causal mechanisms. These results demonstrate the potential application of machine learning methods as a diagnostic tool in healthcare settings.
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (5)
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006258 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 06258&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1006258
DOI: 10.1371/journal.pcbi.1006258
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().