This was written back in 2005-2006, and hasn't been used or tested in the last few years, so take with a pinch of salt.

Disaster Tolerant Services using Virtual Machine Monitors

Overview

The outcome of this project is to present a disaster tolerant virtual machine that can automatically recover from physical host failures. There are no alterations to the guests, meaning it is possible to have a disaster tolerant MS DOS server...

The virtual machine is constructed using VMware server, though it should equally be possible using Xen, or any other virtual machine monitor.

Disaster tolerance is provided by assigning two physical hosts (which may be physically remote), which run a number of virtual machines. When one physical host becomes unavailable, the virtual machines it hosted are restarted by the second physical host.

This configuration allows for the service virtual machines to be normally load-balanced between the two physical hosts, though it is important to remember that each physical host may be required to host all available virtual machines, so spec accordingly. It is down to local preference as to whether service during a 'disaster' event can continue degraded (which may be the case in a load-balancing environment), or must be continued with no performance hit (which would require a non-load balancing environment).

Technical overview

Two physical x86 hosts are to mirror some of their local disk storage with each other using the distributed replicated block device system for Linux. The physical hosts' local disk storage is controlled by LVM for easy volume management. Each physical host runs the VMware server suite or Xen and run the Linux high-availability project's heartbeat system to monitor each other for 'disaster' events.

Events

Provision a new disaster tolerant virtual machine

  1. Create identically sized logical volume on both physical hosts.
  2. Configure DRBD to be aware of the new volumes.
  3. Force one physical host to be master for this service.
  4. Make a file system on the new DRBD device and mount it.
  5. Create a new virtual machine, ensuring all virtual disks and configuration is stored on the newly created file system.
  6. Install the service virtual machine (using the VMware or Xen console applications).
  7. Shutdown the service virtual machine.
  8. Stop the forced master configuration.
  9. Configure heartbeat to manage the new virtual machine.

A 'disaster' event occurs

Assume 4 service virtual machines (vm1-4) and two physical hosts (p1 and p2). p1 hosts vm1-2 and p2 hosts vm3-4.

A 'disaster' event occurs, making p1 non-functional.

  1. p2 detects lack of heartbeat updates from p1.
  2. p2 becomes the DRBD primary for all DRBD devices formally hosted by p1.
  3. p2 starts the vm1 and vm2.
  4. Normal service is restored.

A 'disaster' ends

Assume 4 service virtual machines (vm1-4) and two physical hosts (p1 and p2). p1 hosts vm1-2 and p2 hosts vm3-4.

A 'disaster' event has occurred, making p1 non-functional, such that p2 has taken over vm1-2.

  1. p1 starts. Contacts the DRBD service on p2, and discovers it is now out of date.
  2. p1 synchronises its DRBD nodes so as to be in the same state as p2.
  3. Once synchronised, p1 starts sending heartbeats.
  4. p2 is informed of p1's ability to take back services.
  5. Optional: p2 suspends the services it took over from p1, handing them back to p1.
  6. Normal service is restored.

Technical Steps

All these steps must be performed on both computers. You may find it useful to have a central repository of configuration items that are rsync'd to the VM hosts.

  1. Install a linux distribution on two computers (tested using Debian).
  2. Configure LVM Volume groups, but leave disk space for the VMs unallocated.
  3. Install DRBD. apt-get install drbd0.7-utils drbd0.7-module-source ; module-assistant auto-install drbd0.7-module.
  4. Install linux heartbeat (tested using version 1 but version 2 should work equally well, if not better). apt-get install heartbeat.
  5. Install VMware or Xen.
  6. Setup DRBD and heartbeat.
  7. Setup VMware or Xen.
  8. Read the list of gotchas and be careful not to fall into their trap!
  9. Create the first service virtual machine.
  10. Test the setup by pulling the power on the physical host and enjoy the feeling of redundancy as the second takes over..
  11. Create many more service virtual machines.

Related Reading


My e-mail address, hopefully in a spam defeating format
6 March 2006.