Faecal microbiome-based machine learning for multi-class disease diagnosis
Qi Su,
Qin Liu,
Raphaela Iris Lau,
Jingwan Zhang,
Zhilu Xu,
Yun Kit Yeoh,
Thomas W. H. Leung,
Whitney Tang,
Lin Zhang,
Jessie Q. Y. Liang,
Yuk Kam Yau,
Jiaying Zheng,
Chengyu Liu,
Mengjing Zhang,
Chun Pan Cheung,
Jessica Y. L. Ching,
Hein M. Tun,
Jun Yu,
Francis K. L. Chan and
Siew C. Ng ()
Additional contact information
Qi Su: Microbiota I-Center (MagIC)
Qin Liu: Microbiota I-Center (MagIC)
Raphaela Iris Lau: Microbiota I-Center (MagIC)
Jingwan Zhang: Microbiota I-Center (MagIC)
Zhilu Xu: Microbiota I-Center (MagIC)
Yun Kit Yeoh: Microbiota I-Center (MagIC)
Thomas W. H. Leung: The Chinese University of Hong Kong
Whitney Tang: Microbiota I-Center (MagIC)
Lin Zhang: Microbiota I-Center (MagIC)
Jessie Q. Y. Liang: The Chinese University of Hong Kong
Yuk Kam Yau: Microbiota I-Center (MagIC)
Jiaying Zheng: Microbiota I-Center (MagIC)
Chengyu Liu: Microbiota I-Center (MagIC)
Mengjing Zhang: Microbiota I-Center (MagIC)
Chun Pan Cheung: Microbiota I-Center (MagIC)
Jessica Y. L. Ching: Microbiota I-Center (MagIC)
Hein M. Tun: Microbiota I-Center (MagIC)
Jun Yu: The Chinese University of Hong Kong
Francis K. L. Chan: Microbiota I-Center (MagIC)
Siew C. Ng: Microbiota I-Center (MagIC)
Nature Communications, 2022, vol. 13, issue 1, 1-8
Abstract:
Abstract Systemic characterisation of the human faecal microbiome provides the opportunity to develop non-invasive approaches in the diagnosis of a major human disease. However, shared microbial signatures across different diseases make accurate diagnosis challenging in single-disease models. Herein, we present a machine-learning multi-class model using faecal metagenomic dataset of 2,320 individuals with nine well-characterised phenotypes, including colorectal cancer, colorectal adenomas, Crohn’s disease, ulcerative colitis, irritable bowel syndrome, obesity, cardiovascular disease, post-acute COVID-19 syndrome and healthy individuals. Our processed data covers 325 microbial species derived from 14.3 terabytes of sequence. The trained model achieves an area under the receiver operating characteristic curve (AUROC) of 0.90 to 0.99 (Interquartile range, IQR, 0.91–0.94) in predicting different diseases in the independent test set, with a sensitivity of 0.81 to 0.95 (IQR, 0.87–0.93) at a specificity of 0.76 to 0.98 (IQR 0.83–0.95). Metagenomic analysis from public datasets of 1,597 samples across different populations observes comparable predictions with AUROC of 0.69 to 0.91 (IQR 0.79–0.87). Correlation of the top 50 microbial species with disease phenotypes identifies 363 significant associations (FDR
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.nature.com/articles/s41467-022-34405-3 Abstract (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-34405-3
Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/
DOI: 10.1038/s41467-022-34405-3
Access Statistics for this article
Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie
More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().