HPC Clusters – Many Components – Many Point Solutions
Scyld ClusterWare simplifies the administration of all operating system instances in an HPC cluster. System administration of the entire cluster becomes as easy as the administration of a single system.
Beyond the operating system all clusters consist of multiple hardware and software components that need to be managed and monitored: Hardware components such as servers, storage, switches and interconnects and software components such as workload managers and applications. All of these components need to work together seamlessly for the cluster to be operational.
Point solutions such as IPMI based command line interfaces or workload managers provide visibility into specific aspects of a cluster’s status and health. Typically, a cluster administrator has to use a variety of these tools for the respective components to monitor and manage all layers of a cluster.
Scyld Integrated Management Framework – Integrating it all
Scyld IMF integrates these disparate tools into an extensible, Web-based framework that provides a single, unified view of cluster hardware and software assets.
- Scyld IMF leverages the Intelligent Platform Management Interface (IPMI) and status information aggregated by Scyld ClusterWare
- Scyld IMF provides current and historical information on metrics that determine the state of a cluster such as CPU load, available memory, IPMI events all the way down to the speed of a fan
- Scyld IMF provides asset information on all servers in your cluster
- Scyld IMF allows for web-based server control, complementing Scyld ClusterWare’s ability to self-heal through a power cycle process.
- Compute jobs that have been submitted to the cluster through the workload managers TORQUE and Scyld TaskMaster can be monitored and controlled through IMF
Scyld IMF’s web based interface is a rich internet application built on the standard components Java script, the ExtJS library and XML. This approach makes it easy to extend the framework and integrate custom monitoring tools as for example tools for monitoring the status of ethernet or Infiniband switches. Scyld IMF is packaged with the current releases for Scyld ClusterWare 4 and 5.


