Unsupervised learning methods for efficient geographic clustering and identification of disease disparities with applications to county-level colorectal cancer incidence in California

McMahon, Mallory E.; Doroshenko, Lyubov; Roostaei, Javad; Cho, Hyunsoon; Haider, Mansoor A.

Unsupervised learning methods for efficient geographic clustering and identification of disease disparities with applications to county-level colorectal cancer incidence in California

Mallory E. McMahon, Lyubov Doroshenko, Javad Roostaei, Hyunsoon Cho and Mansoor A. Haider ()
Additional contact information
Mallory E. McMahon: North Carolina State University
Lyubov Doroshenko: La Sapienza University of Rome
Javad Roostaei: UNC Gillings School of Global Public Health Chapel Hill
Hyunsoon Cho: National Cancer Center
Mansoor A. Haider: North Carolina State University

Health Care Management Science, 2022, vol. 25, issue 4, No 4, 574-589

Abstract: Abstract Many public health policymaking questions involve data subsets representing application-specific attributes and geographic location. We develop and evaluate standard and tailored techniques for clustering via unsupervised learning (UL) algorithms on such amalgamated (dual-domain) data sets. The aim of the associated algorithms is to identify geographically efficient clusters that also maximize the number of statistically significant differences in disease incidence and demographic variables across top clusters. Two standard UL approaches, k means with k++ initialization (k++) and the standard self-organizing map (SSOM), are considered along with a new, tailored version of the SOM (TSOM). The TSOM algorithm involves optimization of a customized objective function with terms promoting individual geographic cluster cohesion while also maximizing the number of differences across clusters, and two hyper-parameters controlling the relative weighting of geographic and attribute subspaces in a non-Euclidean distance measure within the clustering problem. The performance of these three techniques (k++, SSOM, TSOM) is compared and evaluated in the context of a data set for colorectal cancer incidence in the state of California, at the level of individual counties. Clusters are visualized via chloropleth maps and ordered graphs are also used to illustrate disparities in disease incidence among four identity groups. While all three approaches performed well, the TSOM identified the largest number of disease and demographic disparities while also yielding more geographically efficient top clusters. Techniques presented in this study are relevant to applications including the delivery of health care resources and identifying disparities among identity groups, and to questions involving coordination between county- and state-level policymakers.

Keywords: Spatial clustering; Self-organizing map; K-Means clustering; Optimization; Colorectal cancer (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s10729-022-09604-5 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:kap:hcarem:v:25:y:2022:i:4:d:10.1007_s10729-022-09604-5

Ordering information: This journal article can be ordered from
http://www.springer.com/journal/10729

DOI: 10.1007/s10729-022-09604-5

Access Statistics for this article

Health Care Management Science is currently edited by Yasar Ozcan

More articles in Health Care Management Science from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().