Managing Multiple Master Nodes

ClusterWare supports up to four master nodes on the same private cluster network. Every master node of a given cluster typically references a common /etc/beowulf/config file, which means all master nodes share a common understanding of all compute nodes that are attached to the private cluster network. That is, each master node knows a given physical node (denoted by its MAC addresses) by a common IP address and hostname. The config file's masterorder directive configures which master node controls which compute nodes. Additionally, every master node should share a common understanding of userID (/etc/passwd) and groupID (/etc/group) values.

Active-Passive Masters

In a simple active-passive configuration, all compute nodes are "owned" by one and only one master node at any one time, and the secondary master node (or nodes) comes into play only if and when the primary master fails. A compute node's self-reassignment of ownership is called "cold re-parenting", as it only occurs when a node reboots.

For example, for a cluster with two master nodes and 32 compute nodes, the /etc/beowulf/config file on each master node contains the entry:

masterorder 0-31 10.1.1.1 10.1.1.2

or alternatively, an entry that uses a hyphen to avoid using explicit node numbers:

masterorder - 10.1.1.1 10.1.1.2

where the IP addresses are the static addresses assigned to the two masters. When a compute node boots, each master node interprets the same masterorder directive and knows that master 10.1.1.1 is the primary master and nominally "owns" all the nodes, and 10.1.1.2 is the secondary master which only steps in if the primary master is unresponsive.

Active-Active Masters

Many labs and workgroups today have several compute clusters, where each one is dedicated to a different research team or engineering group, or is used to run different applications. When an unusually large job needs to execute, it may be useful to combine most or all of the nodes into a single larger cluster, and then afterwards split up the cluster when the job is completed. Also, the overall demand for particular applications may change over time, requiring changes in the allocation of nodes to applications.

The downside to this approach of using multiple discrete clusters, each with their separate private cluster network, is that the compute node reconfiguration requires physically rewiring the network cabling, or requires reprogramming a smart switch to move nodes from one discrete network to another.

However, with an active-active configuration, the cluster's master nodes and compute nodes reside on the same common private cluster network. The nodes are divided into subsets, and each subset is actively "owned" by a different master node and perhaps dedicated to separate users and applications. Additionally, each subset is passively associated with other master nodes.

For example, suppose each master node's /etc/beowulf/config contains:

masterorder  0-15 10.1.1.1 10.1.1.2 10.1.1.3
masterorder 16-31 10.1.1.2 10.1.1.1 10.1.1.3

which divides the 32 compute nodes into two subsets of 16, with one subset owned by master 10.1.1.1 and the other subset owned by 10.1.1.2. To add complexity to this example, we introduce a passive third master node, 10.1.1.3, which becomes active only if both master nodes fail. This configuration provides for several advantages over two discrete 16-node clusters. One advantage is the same as provided by an active-passive configuration: in the event of a failure of one master node, that master's compute nodes automatically reboot and "cold re-parent" to another master node, which now becomes the active "owner" of all 32 compute nodes.

Another advantage is that the cluster administrator can easily respond to changing demands for computing resources through a controlled and methodical migration of nodes between masters. For example, the administrator can shift eight nodes, n16 to n23, from one master to the other by changing the masterorder entries to be:

masterorder  0-23 10.1.1.1 10.1.1.2 10.1.1.3
masterorder 24-31 10.1.1.2 10.1.1.1 10.1.1.3

and replicating this same change to all other master nodes. Then the administrator executes on every master node the command service beowulf reload, which instructs beoserv and bpmaster daemons to re-read the changed /etc/beowulf/config. Finally, on the currently "owning" master the administrator executes the command bpctl -S 16-23 -R, which reboots those shifted eight nodes and thereby causes them to cold re-parent to a different master node.

Reversing this reconfiguration, or performing any other reconfiguration, is equally simple:

  1. Edit /etc/beowulf/config on one master to change the masterorder entries,
  2. Replicate these same changes (or copy the same config file) to every affected master node,

3. Execute service beowulf reload on each master node to re-read the config file, and

  1. Execute bpctl -S <noderange> -R on the current "owning" master node, where <noderange> is the range of affected nodes, which tells the affected node(s) to reboot and re-parent to their new active master.