Infectious Disease Relational Data Analysis Using String Grammar Non-Euclidean Relational Fuzzy C-Means
Apiwat Budwong,
Sansanee Auephanwiriyakul and
Nipon Theera-Umpon
Additional contact information
Apiwat Budwong: Department of Computer Engineering, Faculty of Engineering, Graduate School, Chiang Mai University, Chiang Mai 50200, Thailand
Sansanee Auephanwiriyakul: Department of Computer Engineering, Faculty of Engineering, Excellence Center in Infrastructure Technology and Transportation Engineering, Biomedical Engineering Institute, Chiang Mai University, Chiang Mai 50200, Thailand
Nipon Theera-Umpon: Department of Electrical Engineering, Faculty of Engineering, Biomedical Engineering Institute, Chiang Mai University, Chiang Mai 50200, Thailand
IJERPH, 2021, vol. 18, issue 15, 1-18
Abstract:
Statistical analysis in infectious diseases is becoming more important, especially in prevention policy development. To achieve that, the epidemiology, a study of the relationship between the occurrence and who/when/where, is needed. In this paper, we develop the string grammar non-Euclidean relational fuzzy C-means (sgNERF-CM) algorithm to determine a relationship inside the data from the age, career, and month viewpoint for all provinces in Thailand for the dengue fever, influenza, and Hepatitis B virus (HBV) infection. The Dunn’s index is used to select the best models because of its ability to identify the compact and well-separated clusters. We compare the results of the sgNERF-CM algorithm with the string grammar relational hard C-means (sgRHCM) algorithm. In addition, their numerical counterparts, i.e., relational hard C-means (RHCM) and non-Euclidean relational fuzzy C-means (NERF-CM) algorithms are also applied in the comparison. We found that the sgNERF-CM algorithm is far better than the numerical counterparts and better than the sgRHCM algorithm in most cases. From the results, we found that the month-based dataset does not help in relationship-finding since the diseases tend to happen all year round. People from different age ranges in different regions in Thailand have different numbers of dengue fever infections. The occupations that have a higher chance to have dengue fever are student and teacher groups from the central, north-east, north, and south regions. Additionally, students in all regions, except the central region, have a high risk of dengue infection. For the influenza dataset, we found that a group of people with the age of more than 1 year to 64 years old has higher number of influenza infections in every province. Most occupations in all regions have a higher risk of infecting the influenza. For the HBV dataset, people in all regions with an age between 10 to 65 years old have a high risk in infecting the disease. In addition, only farmer and general contractor groups in all regions have high chance of infecting HBV as well.
Keywords: relational data; string grammar non-Euclidean relational fuzzy C-means; Levenshtein distance; dengue fever; influenza; Hepatitis B virus (HBV) (search for similar items in EconPapers)
JEL-codes: I I1 I3 Q Q5 (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/1660-4601/18/15/8153/pdf (application/pdf)
https://www.mdpi.com/1660-4601/18/15/8153/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jijerp:v:18:y:2021:i:15:p:8153-:d:606587
Access Statistics for this article
IJERPH is currently edited by Ms. Jenna Liu
More articles in IJERPH from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().