EconPapers    
Economics at your fingertips  
 

Clustering Big Data by Extreme Kurtosis Projections

Janeth Carolina Rendon Aguirre
Authors registered in the RePEc Author Service: Daniel Peña

DES - Working Papers. Statistics and Econometrics. WS from Universidad Carlos III de Madrid. Departamento de Estadística

Abstract: Clustering Big Data is an important problem because large samples of many variables are usually heterogeneous and include mixtures of several populations. It often happens that only some of a large set of variables are useful for clustering and working with all of them would be very inefficient and may make more difficult the identification of the clusters. Thus, searching for spaces of lower dimension that include all the relevant information about the clusters seems a sensible way to proceed in these situations. Peña and Prieto (2001) showed that the extreme kurtosis directions of projected data are optimal when the data has been generated by mixtures of two normal distributions. We generalize this result for any number of mixtures and show that the extreme kurtosis directions of the projected data are linear combinations of the optimal discriminant directions if we knew the centers of the components of the mixture. In order to separate the groups we want directions that split the data into two groups, each corresponding to different components of the mixture. We prove that these directions can be found from extreme kurtosis projections. This result suggests a new procedure to deal with many groups, working in a binary decision way and deciding at each step if the data should be split into two groups or we should stop. The decision is based on comparing a single distribution with a mixture of two distribution. The performance of the algorithm is analyzed through a simulation study.

Keywords: High; dimension; Projection; Pursuit; Mixture; models (search for similar items in EconPapers)
Date: 2017-04-27
New Economics Papers: this item is included in nep-ecm and nep-pay
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://e-archivo.uc3m.es/rest/api/core/bitstreams ... 81d9deb79e45/content (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:cte:wsrepe:24522

Access Statistics for this paper

More papers in DES - Working Papers. Statistics and Econometrics. WS from Universidad Carlos III de Madrid. Departamento de Estadística
Bibliographic data for series maintained by Ana Poveda ().

 
Page updated 2025-03-19
Handle: RePEc:cte:wsrepe:24522