AggMon: Scalable Hierarchical Cluster Monitoring
Erich Focht () and
Andreas Jeutter ()
Additional contact information
Erich Focht: NEC HPC Europe
Andreas Jeutter: NEC HPC Europe
A chapter in Sustained Simulation Performance 2012, 2013, pp 51-64 from Springer
Abstract:
Abstract Monitoring and supervising a huge number of compute nodes within a typical HPC cluster is an expensive task. Expensive in the sense of occupying bandwidth, and CPU power that would be better spend for application needs. In this paper, we describe a monitoring framework that is used to supervise thousands of compute nodes in a HPC cluster computer in an efficient way. Within this framework the compute nodes are organized in groups. Groups contain other groups and form a tree-like hierarchical graph. Communication paths are strictly along the edges of the graph. To decouple the components in the network a publish/subscribe messaging system based on AMQP has been chosen. Monitoring data is stored within a distributed time-series database that is located on dedicated nodes in the tree. For database queries and other administrative tasks a synchronous RPC channel, that is completely independent of the hierarchy has been implemented. A browser-based front-end to present the data to the user is currently in development.
Keywords: Monitoring Framework; Master Group; System State Tracking; Time-value Pairs; Database Instance (search for similar items in EconPapers)
Date: 2013
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:sprchp:978-3-642-32454-3_5
Ordering information: This item can be ordered from
http://www.springer.com/9783642324543
DOI: 10.1007/978-3-642-32454-3_5
Access Statistics for this chapter
More chapters in Springer Books from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().