EconPapers    
Economics at your fingertips  
 

AggMon: Scalable Hierarchical Cluster Monitoring

Erich Focht () and Andreas Jeutter ()
Additional contact information
Erich Focht: NEC HPC Europe
Andreas Jeutter: NEC HPC Europe

A chapter in Sustained Simulation Performance 2012, 2013, pp 51-64 from Springer

Abstract: Abstract Monitoring and supervising a huge number of compute nodes within a typical HPC cluster is an expensive task. Expensive in the sense of occupying bandwidth, and CPU power that would be better spend for application needs. In this paper, we describe a monitoring framework that is used to supervise thousands of compute nodes in a HPC cluster computer in an efficient way. Within this framework the compute nodes are organized in groups. Groups contain other groups and form a tree-like hierarchical graph. Communication paths are strictly along the edges of the graph. To decouple the components in the network a publish/subscribe messaging system based on AMQP has been chosen. Monitoring data is stored within a distributed time-series database that is located on dedicated nodes in the tree. For database queries and other administrative tasks a synchronous RPC channel, that is completely independent of the hierarchy has been implemented. A browser-based front-end to present the data to the user is currently in development.

Keywords: Monitoring Framework; Master Group; System State Tracking; Time-value Pairs; Database Instance (search for similar items in EconPapers)
Date: 2013
References: Add references at CitEc
Citations:

There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:sprchp:978-3-642-32454-3_5

Ordering information: This item can be ordered from
http://www.springer.com/9783642324543

DOI: 10.1007/978-3-642-32454-3_5

Access Statistics for this chapter

More chapters in Springer Books from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2026-05-29
Handle: RePEc:spr:sprchp:978-3-642-32454-3_5