On Rank Selection in Non-Negative Matrix Factorization Using Concordance
Paul Fogel,
Christophe Geissler,
Nicolas Morizet and
George Luta ()
Additional contact information
Paul Fogel: Mazars, Tour Exaltis, 61 Rue Henri-Régnault, 92400 Courbevoie, France
Christophe Geissler: Mazars, Tour Exaltis, 61 Rue Henri-Régnault, 92400 Courbevoie, France
Nicolas Morizet: Mazars, Tour Exaltis, 61 Rue Henri-Régnault, 92400 Courbevoie, France
George Luta: Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University, 3700 O St NW, Washington, DC 20057, USA
Mathematics, 2023, vol. 11, issue 22, 1-18
Abstract:
The choice of the factorization rank of a matrix is critical, e.g., in dimensionality reduction, filtering, clustering, deconvolution, etc., because selecting a rank that is too high amounts to adjusting the noise, while selecting a rank that is too low results in the oversimplification of the signal. Numerous methods for selecting the factorization rank of a non-negative matrix have been proposed. One of them is the cophenetic correlation coefficient ( c c c ), widely used in data science to evaluate the number of clusters in a hierarchical clustering. In previous work, it was shown that c c c performs better than other methods for rank selection in non-negative matrix factorization (NMF) when the underlying structure of the matrix consists of orthogonal clusters. In this article, we show that using the ratio of c c c to the approximation error significantly improves the accuracy of the rank selection. We also propose a new criterion, c o n c o r d a n c e , which, like c c c , benefits from the stochastic nature of NMF; its accuracy is also improved by using its ratio-to-error form. Using real and simulated data, we show that c o n c o r d a n c e , with a CUSUM-based automatic detection algorithm for its original or ratio-to-error forms, significantly outperforms c c c . It is important to note that the new criterion works for a broader class of matrices, where the underlying clusters are not assumed to be orthogonal.
Keywords: clustering; dimensionality reduction; machine learning; NMF; cophenetic correlation coefficient; concordance; CUSUM (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/11/22/4611/pdf (application/pdf)
https://www.mdpi.com/2227-7390/11/22/4611/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:11:y:2023:i:22:p:4611-:d:1278024
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().