EconPapers    
Economics at your fingertips  
 

Dimensionality reduction for binary data through the projection of natural parameters

Andrew J. Landgraf and Yoonkyung Lee

Journal of Multivariate Analysis, 2020, vol. 180, issue C

Abstract: Principal component analysis (PCA) for binary data, known as logistic PCA, has become a popular alternative to dimensionality reduction of binary data. It is motivated as an extension of ordinary PCA by means of a matrix factorization, akin to the singular value decomposition, that maximizes the Bernoulli log-likelihood. We propose a new formulation of logistic PCA which extends Pearson’s formulation of a low dimensional data representation with minimum error to binary data. Our formulation does not require a matrix factorization, as previous methods do, but instead looks for projections of the natural parameters from the saturated model. Due to this difference, the number of parameters does not grow with the number of observations and the principal component scores on new data can be computed with simple matrix multiplication. We derive explicit solutions for data matrices of special structure and provide a computationally efficient algorithm for solving for the principal component loadings. Through simulation experiments and an analysis of medical diagnoses data, we compare our formulation of logistic PCA to the previous formulation as well as ordinary PCA to demonstrate its benefits.

Keywords: Binary data; Exponential family; Logistic PCA; Principal component analysis (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (5)

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0047259X20302499
Full text for ScienceDirect subscribers only

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:jmvana:v:180:y:2020:i:c:s0047259x20302499

Ordering information: This journal article can be ordered from
http://www.elsevier.com/wps/find/supportfaq.cws_home/regional
https://shop.elsevie ... _01_ooc_1&version=01

DOI: 10.1016/j.jmva.2020.104668

Access Statistics for this article

Journal of Multivariate Analysis is currently edited by de Leeuw, J.

More articles in Journal of Multivariate Analysis from Elsevier
Bibliographic data for series maintained by Catherine Liu ().

 
Page updated 2025-03-19
Handle: RePEc:eee:jmvana:v:180:y:2020:i:c:s0047259x20302499