Integrating machine learning with SHAP to uncover multi-tissue molecular signatures in Osteoarthritis progression
Jifeng Zhao,
Jiasheng Tao,
Yizhe Song,
Jiyong Yang,
Xiaodong Lin,
Zhilong Ye,
Chao Lu,
Mingzhu Zeng,
Weijian Chen and
Wengang Liu
PLOS ONE, 2026, vol. 21, issue 3, 1-20
Abstract:
Osteoarthritis (OA) is a chronic joint disorder characterized by pain, reduced mobility, and structural degeneration. Despite its complex etiology and multi-tissue involvement, the molecular mechanisms underlying OA remain poorly understood. This study aimed to identify tissue-specific diagnostic biomarkers using an integrative framework combining multiple machine learning (ML) algorithms and SHapley Additive exPlanations (SHAP). Gene expression profiles from cartilage, synovium, and peripheral blood were retrieved from the GEO database. DEGs were identified across tissues, followed by feature selection using Least Absolute Shrinkage and Selection Operator(LASSO), Support Vector Machine Recursive Feature Elimination (SVM-RFE), and Random Forest(RF). Functional enrichment, gene set variation analysis (GSVA), and immune infiltration analyses were conducted. 10 ML models were constructed to evaluate diagnostic performance. A total of 8, 28, and 61 DEGs were identified in cartilage, synovium, and blood, respectively. Enrichment analysis revealed the key roles in inflammatory signaling, metabolism, and immune pathways. Biomarkers identified included CSN1S1, ABCA6, RARRES1, NPTX2 (cartilage); SCRG1, CXCL2, PTGDS, CCL19, BGN, KLF9 (synovium); and GNL3L, C6orf111, NT5C3, ZNF148 (blood). Immune analysis indicated shifts in mast cells and CD8 + T cells in cartilage and dendritic cells in synovium, while no significant immune alterations were found in blood. Diagnostic models demonstrated strong performance, with AUCs of 0.839 (cartilage), 0.934 (synovium), and 0.892 (blood). SHAP analysis was employed to interpret each model by quantifying the contribution of individual genes to predict outcomes. In the optimal cartilage model, CSN1S1 and ABCA6 were the most influential features, with mean absolute SHAP values of 0.146 and 0.122, respectively. For synovium, SCRG1 (0.111) and CXCL2 (0.097) were top contributors, while in blood, GNL3L (0.148) and C6orf111 (0.143) showed the highest predictive importance. These results underscore the interpretability of the models and validate the functional relevance of selected biomarkers. Collectively, this study provides a robust ML-based framework for identifying and interpreting reliable OA biomarkers across multiple tissues, offering valuable insights into disease mechanisms and supporting the development of diagnostic tools.
Date: 2026
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0343226 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 43226&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0343226
DOI: 10.1371/journal.pone.0343226
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().