EconPapers    
Economics at your fingertips  
 

Machine learning-driven identification of virulence determinants in Borrelia burgdorferi associated with human dissemination

Hoa Thanh Nguyen and Catherine A Brissette

PLOS Computational Biology, 2026, vol. 22, issue 6, 1-25

Abstract: Lyme disease, the most common tick-borne infectious disease in the United States, presents with highly variable clinical outcomes, ranging from localized erythema migrans to severe disseminated complications affecting the heart, joints, and nervous system. The bacterial determinants underlying this phenotypic variation remain largely unknown, limiting our ability to predict disease progression and optimize treatment strategies. Here, we applied machine learning (ML) approaches to identify specific amino acid residues within surface-exposed virulence factors that predict human dissemination phenotypes. Utilizing the published whole genome sequences from 299 clinical Borrelia burgdorferi isolates collected from the United States and Slovenia over a 30-year period (1992–2021), we extracted and characterized translated amino acid sequences (variants) of seven known virulence factors (BB_0406, BBK32, DbpA, OspA, OspC, P66, and RevA). Protein variants were classified based on their association with disseminated versus localized infections using clinical metadata. Cramér’s V analysis revealed possible strong associations between dissemination phenotypes and five adhesins: BBK32, DbpA, OspC, P66, and RevA. We developed ML models using five algorithms with multiple feature selection strategies, achieving robust predictive performance for DbpA, OspC, and RevA variants (all performance metrics > 0.7). Feature importance analysis identified 57, 29, and 42 key predictive residues for DbpA, OspC, and RevA, respectively. Notably, B-cell epitope prediction revealed significant enrichment of ML-identified residues within predicted epitope regions for OspC (11 overlapping residues, OR = 3.57, p = 0.006) and RevA (12 overlapping residues, OR = 2.37, p = 0.048), suggesting these residues may influence immune recognition and bacterial persistence. This study establishes the first computational framework linking Borrelia protein sequence variants to clinical dissemination phenotypes, providing molecular insights into Lyme disease pathogenesis that may inform the development of improved diagnostics and therapeutic targets.Author summary: Lyme disease, caused by Borrelia burgdorferi (Bb), afflicts over 470,000 Americans annually, yet predicting which infections will progress from localized to disseminated disease remains a critical clinical challenge. Despite decades of research, the bacterial factors underlying these divergent disease trajectories have remained poorly understood. To address this gap, we leveraged the largest collection of clinical Bb genomes to date and applied machine learning to identify specific amino acid residues within key membrane surface proteins that distinguish invasive from non-invasive strains. Our analysis reveals that many of these predictive residues overlap with immune recognition and host-protein binding sites, suggesting a mechanistic link between sequence variation, immune evasion, and disease severity. This computational framework provides a foundation for investigating key virulence factors in Bb pathogenesis and developing rapid, sequence-based diagnostic tools that could guide treatment strategies. Beyond Lyme disease, our methodology demonstrates broad applicability to other bacterial pathogens where strain-level variation influences clinical outcomes.

Date: 2026
References: Add references at CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1014407 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 14407&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1014407

DOI: 10.1371/journal.pcbi.1014407

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2026-06-21
Handle: RePEc:plo:pcbi00:1014407