At home, I wanted a reliable disk solution for backups and also wanted a big, blank and resizable storage system for virtual machines. I knew I wanted to be able to get at the shared disk remotely from other nodes and wanted to be able to replace broken hardware quickly if something failed. I also didn’t want to spend a lot of time reconfiguring OSs and software in the case of a total system failure.
I have two cheap computers and so I put some big disks in them and mirrored the disks over the network. Instead of using one file server node and RAID1, this is something like a “whole system RAID”. If anything at all breaks in either computer, hosted services can keep running and data is unharmed except for whatever was unsynced in RAM.
To accomplish the disk mirror I used DRBD. DRBD is a special block device that is designed for highly available clusters, it mirrors activity directly at the block device level across the network to another disk. So like a RAID1 configuration over the network. It lets you build something like the shared storage devices on a SAN, but without any special hardware. This provides the basic reliability layer.

Linux Logical Volume Management (LVM) is a popular tool that lets you flexibly manage disk space. Instead of just partitioning the disk, using LVM lets us do on-the-fly logical partition resizing, snapshots (including hosting snapshot+diffs), and adding more physical disks into the volume group as needs grow (you can even resize a logical partition across multiple underlying disks). Each logical partition is formatted with a filesystem of its own. Using LVM avoided some future headaches I think.
That is how the disk is setup, now how to access it remotely? You could run a shared filesystem of course, exporting via an NFS server on host A (or B). Instead, having heard good things about Global Network Block Device (GNBD) on the Xen mailing lists, I chose to export the logical block devices (from LVM) directly over the network with GNBD. Another node makes a GNBD import and the block device appears to be a local block device there, ready to mount into the file hierarchy. This is like iSCSI but it is a snap to set up and use.
And if that other node is a Xen domain 0, that block device is very handily ready to be used as a VM image, just as if it was a raw partition on that node.

Here’s an example Xen configuration using the imported block device:
disk = [ ‘phy:/dev/gnbd/vmimage001,sda1,w’]
The guest VM needs no awareness of all these tools, it just sees its sda1 and mounts it like anything else Xen presents to it as one of its “hardware” partitions.
Instead of just using the file store for backups and VMs that are used intermittently, I’m also running persistent services like websites, the incremental-backup server and a media server in VMs stored there.
First, this allows for basic backups of the LAN services without any backup software, that’s nice to have, although I really prefer a combination of incremental backups and RAID1. (Here we also avoid a Russell’s paradox situation with the backup server).
Second, keeping time-consuming-to-configure services in a VM allows me to replace hardware quickly, including whole computers in the event of a total failure: the only software I’ll ever need to reinstall is {Linux, DRBD, LVM, GNBD} for a file server node and {Linux, Xen} for a VMM node.
As long as net latency is really low (here it is sub-ms) it doesn’t really matter that the disk is remote for any of my uses. The VMs always respond very well.
(I should mention: you could of course take GNBD out of the picture and run the VMs on host A and B if Xen were installed there)
Another bonus: using GNBD, you can live-migrate the VM to any node that can do a GNBD import. This is nice to have. I only live migrate manually, though. Both DRBD and GNBD have some features that allow for seamless failover but I don’t really need any of this at home.
To learn more about that, check out this paper on the new DRBD (it is interesting): http://www.drbd.org/fileadmin/drbd/publications/drbd8_wpnr.pdf
Thinking about high availability in this kind of setup for a minute, a possible and simple to execute arrangement for services that need to be up at all times would be to take two DRBD mirrored nodes, run VMs on one or both of them, and have the physical nodes heartbeat each other. This is a simpler approach than a centralized file server with block device export, here we just have two peer VMMs that are “watching out” for each other.
You’d have two master/slave arrangements, so in the normal operating case: one VMM with partition A as DRBD master and partition B as secondary, then on the other VMM you have partition A as secondary and partition B as master. VMs run from a partition that is the DRBD master.
Let’s say you split four services into four VMs and put two VMs on each physical node. One of the physical node’s disks fail entirely and a monitor process notices. The heartbeat script makes sure the OK node is now the DRBD master for both partition A and B. Then it boots the two VMs previously hosted on the failed node on the OK node, re-allocating RAM for the time being to accomodate all four VMs.

The applications in those VMs recover just as if they went through a normal hard system reset (their network addresses can stay the same since both physical nodes are on the same LAN). Once the administrator gets the alert email and puts a new disk in, another script is ready to resync DRBD and then migrate two of the VMs back to their normal place.
This seems like something to consider for a highly hammered and important head node (like a Globus GRAM node for example). All it takes is another node, commodity hardware and open source software!

12 Responses to “DRBD, LVM, GNBD, and Xen for free and reliable SAN”
March 18th, 2007 at 10:13 am
hi all. nice blog. its very ineresting article.
March 21st, 2007 at 9:14 am
Hi all,
looks very good.
Anybody tried the HA solution with drbd 8?
March 21st, 2007 at 5:16 pm
Just wanted to clarify that this really requires 3 nodes to avoid a deadlock.
“You MUST NOT use gnbd to import devices on the
machine that they are being exported from. This will deadlock your machine.”
http://sources.redhat.com/cluster/gnbd/gnbd_usage.txt
March 21st, 2007 at 6:37 pm
I’m a bit confused about having one node with partition A as a DRBD master and the other partition on that same node as the secondary? Why not just have that whole node function as master for all its VMs? Is that meant to distribute processing and read I/O resources? I think this arrangement is complicated and confusing. It seems simpler just have one node be drbd master for everything and just use a powerful system.
I’m also assuming that VMs must be running Linux because the drbd runs only within Linux.
March 22nd, 2007 at 9:37 am
Brandon: thanks for the link, I did not know that (and never ran across the failure because I don’t use it like that). But right by that quote it says you can just bypass and use the block device name, probably some juggling in the scripts to know which to use would work.
March 22nd, 2007 at 9:40 am
mauricev: if you have an extra computer and can afford not to utilize it then to keep things simpler you could put everything on one and keep the backup idle, sure why not.
October 30th, 2007 at 4:24 pm
If you want to run a two node system you can skip the GNBD configuration altogether. I use Xen/DRBD on two systems. The DRBD configuration points to LVM disk slices. Xen runs on both machines. I use a heartbeat configuration like mentioned above and it works fine. I use DRBD on top of the LVM slices for good reason. This way I can load balance on two servers. The configuration will fall back to one server if one of the Xen nodes goes down. Of course you need to be sure that each node has enough memory for running all the VMs if it does fail-over.
January 30th, 2008 at 7:40 am
Hi, I’m need GNBD to export iso files across network. Nothing more than that. But need cluster to make SAN storages available to all nodes connected to my mail server (To avoid vgscan and vgchange -ay). everytime a Logical volume is created.
But i need to use gnbd_client and gnbd_serv on the same physical machine.Please help
April 2nd, 2008 at 10:35 am
Ran into this related article:
“Using Xen for High Availability Clusters”
http://www.onlamp.com/pub/a/onlamp/2008/02/05/using-xen-for-high-availabilty-clusters.html
May 8th, 2008 at 3:37 pm
In response to: I’m also assuming that VMs must be running Linux because the drbd runs only within Linux.
The guest VMs may run any operating system. They specifics of DRBD and/or GNBD are not at all visible the guest OS. All they see is a (virtual) physical block device. If you combine this is HVM (hard-based virtual machine) then you can run any OS on top of the high-availability infrastructure.
Our business successfully depends on this capability.
May 13th, 2008 at 1:17 am
Also see:
http://docs.google.com/View?docID=dhh4z6n4_96w387mqhn&revision=_latest
Preparing For EC2 Persistent Storage — Using LVM DRBD NFS Heartbeat VTun To Gain Data Persistence, Redundancy, Automatic Fail-Over, and Read/Write Disk Access Across Multiple EC2 Nodes
June 11th, 2008 at 6:33 am
Drbd from 8.0.6 onwards has a script to support Xen directly:
http://blogs.linbit.com/florian/2007/09/03/drbd-806-brings-full-live-migration-for-xen-on-drbd/
Configure a separate drbd resource for each VM, e.g. on top of LVM, so that you can start each virtual machine on either server. If you set allow-two-primaries then you can even live-migrate them.