Applications of a Novel Clustering Approach Using Non-Negative Matrix Factorization to Environmental Research in Public Health

Fogel, Paul; Gaston-Mathé, Yann; Hawkins, Douglas; Fogel, Fajwel; Luta, George; Young, S. Stanley

Applications of a Novel Clustering Approach Using Non-Negative Matrix Factorization to Environmental Research in Public Health

Paul Fogel, Yann Gaston-Mathé, Douglas Hawkins, Fajwel Fogel, George Luta and S. Stanley Young
Additional contact information
Paul Fogel: Independent Consultant, Paris 75006, France
Yann Gaston-Mathé: YGM Consult, CEO, Paris 75015, France
Douglas Hawkins: School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA
Fajwel Fogel: Institute Louis Bachelier, Paris 75002, France
George Luta: Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University, Washington, DC 20057, USA
S. Stanley Young: CGStat, CEO, Raleigh, NC 27607, USA

IJERPH, 2016, vol. 13, issue 5, 1-14

Abstract: Often data can be represented as a matrix, e.g., observations as rows and variables as columns, or as a doubly classified contingency table. Researchers may be interested in clustering the observations, the variables, or both. If the data is non-negative, then Non-negative Matrix Factorization (NMF) can be used to perform the clustering. By its nature, NMF-based clustering is focused on the large values. If the data is normalized by subtracting the row/column means, it becomes of mixed signs and the original NMF cannot be used. Our idea is to split and then concatenate the positive and negative parts of the matrix, after taking the absolute value of the negative elements. NMF applied to the concatenated data, which we call PosNegNMF, offers the advantages of the original NMF approach, while giving equal weight to large and small values. We use two public health datasets to illustrate the new method and compare it with alternative clustering methods, such as K-means and clustering methods based on the Singular Value Decomposition (SVD) or Principal Component Analysis (PCA). With the exception of situations where a reasonably accurate factorization can be achieved using the first SVD component, we recommend that the epidemiologists and environmental scientists use the new method to obtain clusters with improved quality and interpretability.

Keywords: SVD; PCA; NMF; K-means (search for similar items in EconPapers)
JEL-codes: I I1 I3 Q Q5 (search for similar items in EconPapers)
Date: 2016
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.mdpi.com/1660-4601/13/5/509/pdf (application/pdf)
https://www.mdpi.com/1660-4601/13/5/509/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jijerp:v:13:y:2016:i:5:p:509-:d:70287

Access Statistics for this article

IJERPH is currently edited by Ms. Jenna Liu

More articles in IJERPH from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().