EconPapers    
Economics at your fingertips  
 

On Rank Selection in Non-Negative Matrix Factorization Using Concordance

Paul Fogel, Christophe Geissler, Nicolas Morizet and George Luta ()
Additional contact information
Paul Fogel: Mazars, Tour Exaltis, 61 Rue Henri-Régnault, 92400 Courbevoie, France
Christophe Geissler: Mazars, Tour Exaltis, 61 Rue Henri-Régnault, 92400 Courbevoie, France
Nicolas Morizet: Mazars, Tour Exaltis, 61 Rue Henri-Régnault, 92400 Courbevoie, France
George Luta: Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University, 3700 O St NW, Washington, DC 20057, USA

Mathematics, 2023, vol. 11, issue 22, 1-18

Abstract: The choice of the factorization rank of a matrix is critical, e.g., in dimensionality reduction, filtering, clustering, deconvolution, etc., because selecting a rank that is too high amounts to adjusting the noise, while selecting a rank that is too low results in the oversimplification of the signal. Numerous methods for selecting the factorization rank of a non-negative matrix have been proposed. One of them is the cophenetic correlation coefficient ( c c c ), widely used in data science to evaluate the number of clusters in a hierarchical clustering. In previous work, it was shown that c c c performs better than other methods for rank selection in non-negative matrix factorization (NMF) when the underlying structure of the matrix consists of orthogonal clusters. In this article, we show that using the ratio of c c c to the approximation error significantly improves the accuracy of the rank selection. We also propose a new criterion, c o n c o r d a n c e , which, like c c c , benefits from the stochastic nature of NMF; its accuracy is also improved by using its ratio-to-error form. Using real and simulated data, we show that c o n c o r d a n c e , with a CUSUM-based automatic detection algorithm for its original or ratio-to-error forms, significantly outperforms c c c . It is important to note that the new criterion works for a broader class of matrices, where the underlying clusters are not assumed to be orthogonal.

Keywords: clustering; dimensionality reduction; machine learning; NMF; cophenetic correlation coefficient; concordance; CUSUM (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/11/22/4611/pdf (application/pdf)
https://www.mdpi.com/2227-7390/11/22/4611/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:11:y:2023:i:22:p:4611-:d:1278024

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jmathe:v:11:y:2023:i:22:p:4611-:d:1278024