EconPapers    
Economics at your fingertips  
 

Chunk-wise regularised PCA-based imputation of missing data

A. Iodice D’Enza (), A. Markos () and F. Palumbo ()
Additional contact information
A. Iodice D’Enza: Univeristà degli studi di Napoli Federico II
A. Markos: Democritus University of Thrace
F. Palumbo: Univeristà degli studi di Napoli Federico II

Statistical Methods & Applications, 2022, vol. 31, issue 2, No 14, 365-386

Abstract: Abstract Standard multivariate techniques like Principal Component Analysis (PCA) are based on the eigendecomposition of a matrix and therefore require complete data sets. Recent comparative reviews of PCA algorithms for missing data showed the regularised iterative PCA algorithm (RPCA) to be effective. This paper presents two chunk-wise implementations of RPCA suitable for the imputation of “tall” data sets, that is, data sets with many observations. A “chunk” is a subset of the whole set of available observations. In particular, one implementation is suitable for distributed computation as it imputes each chunk independently. The other implementation, instead, is suitable for incremental computation, where the imputation of each new chunk is based on all the chunks analysed that far. The proposed procedures were compared to batch RPCA considering different data sets and missing data mechanisms. Experimental results showed that the distributed approach had similar performance to batch RPCA for data with entries missing completely at random. The incremental approach showed appreciable performance when the data is missing not completely at random, and the first analysed chunks contain sufficient information on the data structure.

Keywords: Principal components; Missing data; Eigenspace arithmetics (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s10260-021-00575-5 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:stmapp:v:31:y:2022:i:2:d:10.1007_s10260-021-00575-5

Ordering information: This journal article can be ordered from
http://www.springer. ... cs/journal/10260/PS2

DOI: 10.1007/s10260-021-00575-5

Access Statistics for this article

Statistical Methods & Applications is currently edited by Tommaso Proietti

More articles in Statistical Methods & Applications from Springer, Società Italiana di Statistica
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:stmapp:v:31:y:2022:i:2:d:10.1007_s10260-021-00575-5