Decoding Diabetes Biomarkers and Related Molecular Mechanisms by Using Machine Learning, Text Mining, and Gene Expression Analysis
Amira M. Elsherbini (),
Alsamman M. Alsamman,
Nehal M. Elsherbiny,
Mohamed El-Sherbiny,
Rehab Ahmed,
Hasnaa Ali Ebrahim and
Joaira Bakkach
Additional contact information
Amira M. Elsherbini: Department of Oral Biology, Faculty of Dentistry, Mansoura University, Mansoura 35116, Egypt
Alsamman M. Alsamman: Agricultural Genetic Engineering Research Institute, Agricultural Research Center, Giza 12619, Egypt
Nehal M. Elsherbiny: Department of Pharmaceutical Chemistry, Faculty of Pharmacy, University of Tabuk, Tabuk 71491, Saudi Arabia
Mohamed El-Sherbiny: Department of Basic Medical Sciences, College of Medicine, AlMaarefa University, Riyadh 71666, Saudi Arabia
Rehab Ahmed: Department of Natural Products and Alternative Medicine, Faculty of Pharmacy, University of Tabuk, Tabuk 71491, Saudi Arabia
Hasnaa Ali Ebrahim: Department of Basic Medical Sciences, College of Medicine, Princess Nourah bint Abdulrahman University, Riyadh 11671, Saudi Arabia
Joaira Bakkach: Biomedical Genomics and Oncogenetics Research Laboratory, Faculty of Sciences and Techniques of Tangier, Abdelmalek Essaâdi University Morocco, Tétouan 93000, Morocco
IJERPH, 2022, vol. 19, issue 21, 1-18
Abstract:
The molecular basis of diabetes mellitus is yet to be fully elucidated. We aimed to identify the most frequently reported and differential expressed genes (DEGs) in diabetes by using bioinformatics approaches. Text mining was used to screen 40,225 article abstracts from diabetes literature. These studies highlighted 5939 diabetes-related genes spread across 22 human chromosomes, with 112 genes mentioned in more than 50 studies. Among these genes, HNF4A , PPARA , VEGFA , TCF7L2 , HLA-DRB1 , PPARG , NOS3 , KCNJ11 , PRKAA2 , and HNF1A were mentioned in more than 200 articles. These genes are correlated with the regulation of glycogen and polysaccharide, adipogenesis, AGE/RAGE, and macrophage differentiation. Three datasets (44 patients and 57 controls) were subjected to gene expression analysis. The analysis revealed 135 significant DEGs, of which CEACAM6 , ENPP4 , HDAC5 , HPCAL1 , PARVG , STYXL1 , VPS28 , ZBTB33 , ZFP37 and CCDC58 were the top 10 DEGs. These genes were enriched in aerobic respiration, T-cell antigen receptor pathway, tricarboxylic acid metabolic process, vitamin D receptor pathway, toll-like receptor signaling, and endoplasmic reticulum (ER) unfolded protein response. The results of text mining and gene expression analyses used as attribute values for machine learning (ML) analysis. The decision tree, extra-tree regressor and random forest algorithms were used in ML analysis to identify unique markers that could be used as diabetes diagnosis tools. These algorithms produced prediction models with accuracy ranges from 0.6364 to 0.88 and overall confidence interval (CI) of 95%. There were 39 biomarkers that could distinguish diabetic and non-diabetic patients, 12 of which were repeated multiple times. The majority of these genes are associated with stress response, signalling regulation, locomotion, cell motility, growth, and muscle adaptation. Machine learning algorithms highlighted the use of the HLA-DQB1 gene as a biomarker for diabetes early detection. Our data mining and gene expression analysis have provided useful information about potential biomarkers in diabetes.
Keywords: diabetes; text mining; gene expression; bioinformatics; protein–protein interaction network (search for similar items in EconPapers)
JEL-codes: I I1 I3 Q Q5 (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/1660-4601/19/21/13890/pdf (application/pdf)
https://www.mdpi.com/1660-4601/19/21/13890/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jijerp:v:19:y:2022:i:21:p:13890-:d:953503
Access Statistics for this article
IJERPH is currently edited by Ms. Jenna Liu
More articles in IJERPH from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().