XenSource
Skip navigation links
Overview Expand Overview
Products Expand Products
Solutions Expand Solutions
Support Services Expand Support Services
Partners Expand Partners
About Us Expand About Us
How to Buy

2.6. Coping with machine failures

Machine failures come in two varieties: master failures and slave failures. These will be considered separately.

2.6.1. Slave failures

Master nodes detect the failures of slaves by receiving regular heartbeat messages. If no heartbeat has been received for 30 seconds then the master assumes the slave is dead. There are two ways to recover from this problem:

  1. Repair the dead slave (e.g. by physically rebooting it). When the connection to the slave is restored the master will mark the slave as alive again.

  2. Instruct the master to forget about the slave node using the xe host-forget. Once the slave has been forgotten all the VMs which were running there will be marked as offline and can be restarted on other hosts. Note it is very important to ensure that the host is actually offline otherwise VM data corruption may result.

When a slave host fails, there may be VMs still registered in the “running” state. If you are sure that the slave host is definitely down, and that the VMs have not been brought up on another host in the pool, use the xe vm-reset-powerstate command to forcibly halt the VMs. See Section 5.4.20.20, “vm-reset-powerstate” for more details.

2.6.2. Master failures

Every member of a Resource Pool contains all the information necessary to take over the role of master if required. When a master node fails, the following sequence of events occurs:

  1. The slaves realize that communication has been lost and each retry for sixty seconds

  2. Each slave then puts itself into emergency mode, whereby the slave XenServer Hosts will now accept only the pool-emergency commands (xe pool-emergency-reset-master and xe pool-emergency-transition-to-master).

If the master comes back up at this point, it will reestablish communication with its slaves, they will leave emergency mode, and operation will return to normal.

If the master is really dead, though, you should choose one of the slaves and issue to it the command xe pool-emergency-transition-to-master. Once it has become the master, issue the command xe pool-recover-slaves and the slaves will now point to the new master.

If you repair or replace the server that was the original master, you can simply bring it up, install the XenServer Host software, and add it to the pool. Since the XenServer Hosts in the pool are enforced to be homogeneous, there is no real need to make the replaced server the master.

When a slave host is transitioned to being a master, you should also check that the default pool storage repository is set to an appropriate value. This can be done using the xe pool-param-list command and verifying that the default-SR is pointing to a valid storage repository.