Spatial Analysis of COVID-19 Infection Patterns Using Unsupervised Classification with K-Means Clustering
Bisiriyu Lawal Oluwafemi,
Oluwatobi Lucas Akinbode,
Ogunetimoju Abolaji Moses and
Sadoh Afeez Papa
Additional contact information
Bisiriyu Lawal Oluwafemi: Department of Statistics & Data Science, Obafemi Awolowo University, Ile-Ife Osun State, Nigeria
Oluwatobi Lucas Akinbode: Department of Mathematics & Physics, North Carolina Agriculture and Technical State University, Greensboro, United States
Ogunetimoju Abolaji Moses: Department of Statistics, Ekiti State University, Ado-Ekiti Ekiti State, Nigeria
Sadoh Afeez Papa: Department of Statistics, Auchi Polytechnic, Auchi, Edo State Nigeria
International Journal of Research and Innovation in Applied Science, 2025, vol. 10, issue 3, 170-183
Abstract:
Although there has been extensive research on COVID-19 spread, however, a notable gap continues to exist in comprehending the spatial variability of its spread through advanced geospatial techniques. Many studies focus on temporal trends or analyses at the national level, frequently overlooking localized differences that affect the progression and severity of diseases. In this study, we conducted an in-depth Exploratory Data Analysis (EDA) to examine the global spatial and temporal distribution of COVID-19 cases. Using visual analytics such as count plots, bar charts, histograms, and box plots, we identified key trends in confirmed, active, recovered, and death cases. Our findings show that South Africa had the highest confirmed cases in Africa, while Russia and the United Kingdom led in Europe. Iran, Pakistan, and Saudi Arabia reported the highest cases in the Eastern Mediterranean, whereas the United States and Brazil recorded the highest peaks in the Americas. India had the most cases in Asia, while China reported the highest case counts in the Western Pacific. A correlation matrix indicated strong positive relationships between confirmed cases and deaths (0.91), recoveries (0.90), and active cases (0.95), showing that higher case numbers significantly influenced mortality and recovery trends. Spatial dependence analysis using Moran’s Global Index and the Getis-Ord Gi statistic confirmed notable clustering of COVID-19 cases. Hotspot analysis identified high case concentrations in North America, Europe, and Asia, while Africa and parts of South America exhibited lower infection rates. K-means clustering categorized the pandemic’s spread into two distinct clusters regions with the highest and lowest case counts. Our findings emphasize the non-random geographic distribution of COVID-19 cases, highlighting regional disparities and providing insights for targeted interventions.
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.rsisinternational.org/journals/ijrias/ ... -issue-3/170-183.pdf (application/pdf)
https://rsisinternational.org/journals/ijrias/arti ... -k-means-clustering/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bjf:journl:v:10:y:2025:i:3:p:170-183
Access Statistics for this article
International Journal of Research and Innovation in Applied Science is currently edited by Dr. Renu Malsaria
More articles in International Journal of Research and Innovation in Applied Science from International Journal of Research and Innovation in Applied Science (IJRIAS)
Bibliographic data for series maintained by Dr. Renu Malsaria ().