H-NGPCA: Hierarchical clustering of data streams with adaptive number of clusters and adaptive dimensionality
Nico Migenda,
Ralf Möller and
Wolfram Schenck
PLOS ONE, 2026, vol. 21, issue 1, 1-32
Abstract:
We present H-NGPCA, a hierarchical clustering algorithm for data streams that integrates an adaptive unit number growth and local dimensionality control. Unlike existing algorithm, H-NGPCA combines the characteristics of centroid-based, model-based and hierarchical clustering. H-NGPCA builds a hierarchical structure of local Principal Component Analysis (PCA) units, where each unit is a hyper-ellipsoid whose shape is updated by a neural network-based online PCA method. The re-positioning of each unit is handled by Neural Gas, a centroid-based clustering algorithm. In the hierarchical tree structure, new units are created in a branch if suggested by a splitting criterion. In addition, each unit determines its own dimensionality based on the data represented by the unit. In extensive benchmarks, H-NGPCA not only surpasses all competing online algorithms with adaptive unit numbers but also achieves competitive performance with state-of-the-art offline methods, reaching an average NMI = 0.87 and CI = 0.26. This demonstrates that H-NGPCA achieves both online adaptability and offline-level accuracy.
Date: 2026
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0339171 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 39171&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0339171
DOI: 10.1371/journal.pone.0339171
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().