Vertex clustering in diverse dynamic networks
Devavrat Vivek Dabke and
Olga Dorabiala
PLOS Complex Systems, 2024, vol. 1, issue 4, 1-29
Abstract:
We present theoretical and experimental results for spatiotemporal graph k-means (STGkM)—a new unsupervised method to cluster vertices within a dynamic network. STGkM finds both short-term dynamic clusters and a “long-lived” partition of vertices within a network whose topology is evolving over time; we first introduced this technique in a recent conference paper. Here, we update our algorithm with a more efficient relaxation scheme, provide additional theoretical results, compare its performance to several other methods, and demonstrate its capabilities on real, diverse datasets. We construct a theoretical foundation to distinguish STGkM from connected components and static clustering and prove results for the stochastic setting for the first time. In addition to our previous experiments on the United States House of Representatives dataset, we report new state-of-the-art empirical results on a dynamic scientific citation network and Reddit dataset. These findings demonstrate that STGkM is accurate, efficient, informative, and operates well in diverse settings. Finally, as previously noted, one of the main advantages of STGkM is that it has only one required parameter: k, the number of clusters; we therefore include an extended analysis of the range of this parameter and guidance on selecting its optimal value. Our data and code are available on Github; see: https://github.com/dynestic/stgkm.Author summary: With the explosion of data about the world around us, we must constantly develop new ways of studying datasets. One popular method for analysis is k-means, which can identify clusters of related objects based on shared characteristics. Traditionally, k-means worked over sets of objects (e.g. animal subjects in a biology study) for which it was possible to define a list of consistent, numerical, and complete “features” or pieces of information (like age or weight). Over the years, this algorithm has been adapted for a variety of datasets and implemented in efficient code libraries. Our paper builds on this vast literature to extend k-means to the setting of networks that change over time, and we provide a practical and efficient implementation of it for real-world usage. We show that our new algorithm works on datasets like the United States House of Representatives roll call votes, citation networks from major publications, and a sample of Reddit posts. We also provide formal mathematical proofs and demonstrate the theoretical soundness of our technique.
Date: 2024
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/complexsystems/article?id=10.1371/journal.pcsy.0000023 (text/html)
https://journals.plos.org/complexsystems/article/f ... 00023&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcsy00:0000023
DOI: 10.1371/journal.pcsy.0000023
Access Statistics for this article
More articles in PLOS Complex Systems from Public Library of Science
Bibliographic data for series maintained by complexsystem ().