EconPapers    
Economics at your fingertips  
 

Addressing overfitting and underfitting in Gaussian model-based clustering

Jeffrey L. Andrews

Computational Statistics & Data Analysis, 2018, vol. 127, issue C, 160-171

Abstract: The expectation–maximization (EM) algorithm is a common approach for parameter estimation in the context of cluster analysis using finite mixture models. This approach suffers from the well-known issue of convergence to local maxima, but also the less obvious problem of overfitting. These combined, and competing, concerns are illustrated through simulation and then addressed by introducing an algorithm that augments the traditional EM with the nonparametric bootstrap. Further simulations and applications to real data lend support for the usage of this bootstrap augmented EM-style algorithm to avoid both overfitting and local maxima.

Keywords: EM algorithm; Bootstrap; Cluster analysis; Mixture models (search for similar items in EconPapers)
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0167947318301245
Full text for ScienceDirect subscribers only.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:csdana:v:127:y:2018:i:c:p:160-171

DOI: 10.1016/j.csda.2018.05.015

Access Statistics for this article

Computational Statistics & Data Analysis is currently edited by S.P. Azen

More articles in Computational Statistics & Data Analysis from Elsevier
Bibliographic data for series maintained by Catherine Liu ().

 
Page updated 2025-03-19
Handle: RePEc:eee:csdana:v:127:y:2018:i:c:p:160-171