EconPapers    
Economics at your fingertips  
 

Archetypal Analysis With Missing Data: See All Samples by Looking at a Few Based on Extreme Profiles

Irene Epifanio, M. Victoria Ibáñez and Amelia Simó

The American Statistician, 2020, vol. 74, issue 2, 169-183

Abstract: In this article, we propose several methodologies for handling missing or incomplete data in archetype analysis (AA) and archetypoid analysis (ADA). AA seeks to find archetypes, which are convex combinations of data points, and to approximate the samples as mixtures of those archetypes. In ADA, the representative archetypal data belong to the sample, that is, they are actual data points. With the proposed procedures, missing data are not discarded or previously filled by imputation and the theoretical properties regarding location of archetypes are guaranteed, unlike the previous approaches. The new procedures adapt the AA algorithm either by considering the missing values in the computation of the solution or by skipping them. In the first case, the solutions of previous approaches are modified to fulfill the theory and a new procedure is proposed, where the missing values are updated by the fitted values. In this second case, the procedure is based on the estimation of dissimilarities between samples and the projection of these dissimilarities in a new space, where AA or ADA is applied, and those results are used to provide a solution in the original space. A comparative analysis is carried out in a simulation study, with favorable results. The methodology is also applied to two real datasets: a well-known climate dataset and a global development dataset. We illustrate how these unsupervised methodologies allow complex data to be understood, even by nonexperts. Supplementary materials for this article are available online.

Date: 2020
References: Add references at CitEc
Citations:

Downloads: (external link)
http://hdl.handle.net/10.1080/00031305.2018.1545700 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:taf:amstat:v:74:y:2020:i:2:p:169-183

Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/UTAS20

DOI: 10.1080/00031305.2018.1545700

Access Statistics for this article

The American Statistician is currently edited by Eric Sampson

More articles in The American Statistician from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().

 
Page updated 2025-03-20
Handle: RePEc:taf:amstat:v:74:y:2020:i:2:p:169-183