Identifying the Informational/Signal Dimension in Principal Component Analysis

Camiz, Sergio; Pillar, Valério D.

Identifying the Informational/Signal Dimension in Principal Component Analysis

Sergio Camiz and Valério D. Pillar
Additional contact information
Sergio Camiz: Dipartimento di Matematica, Sapienza Università di Roma, 00185 Roma, Italy
Valério D. Pillar: Departamento de Ecologia, Universidade Federal do Rio Grande do Sul, 91501-970 Porto Alegre, Brazil

Mathematics, 2018, vol. 6, issue 11, 1-16

Abstract: The identification of a reduced dimensional representation of the data is among the main issues of exploratory multidimensional data analysis and several solutions had been proposed in the literature according to the method. Principal Component Analysis ( PCA ) is the method that has received the largest attention thus far and several identification methods—the so-called stopping rules —have been proposed, giving very different results in practice, and some comparative study has been carried out. Some inconsistencies in the previous studies led us to try to fix the distinction between signal from noise in PCA —and its limits—and propose a new testing method. This consists in the production of simulated data according to a predefined eigenvalues structure, including zero-eigenvalues. From random populations built according to several such structures, reduced-size samples were extracted and to them different levels of random normal noise were added. This controlled introduction of noise allows a clear distinction between expected signal and noise, the latter relegated to the non-zero eigenvalues in the samples corresponding to zero ones in the population. With this new method, we tested the performance of ten different stopping rules. Of every method, for every structure and every noise, both power (the ability to correctly identify the expected dimension) and type-I error (the detection of a dimension composed only by noise) have been measured, by counting the relative frequencies in which the smallest non-zero eigenvalue in the population was recognized as signal in the samples and that in which the largest zero-eigenvalue was recognized as noise, respectively. This way, the behaviour of the examined methods is clear and their comparison/evaluation is possible. The reported results show that both the generalization of the Bartlett’s test by Rencher and the Bootstrap method by Pillar result much better than all others: both are accounted for reasonable power, decreasing with noise, and very good type-I error. Thus, more than the others, these methods deserve being adopted.

Keywords: Principal Component Analysis; stopping rules; simulated data; rules comparison (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/6/11/269/pdf (application/pdf)
https://www.mdpi.com/2227-7390/6/11/269/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:6:y:2018:i:11:p:269-:d:184261

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().