John Pelan @ Gatsby Unit

Notes on HA NFS using SLES9

While there are a number of descriptions of how to achieve high-availabilty NFS using Linux with the heartbeat and DRBD packages, e.g.
there are currently none for Novell's SuSE Linux Enterprise Server (SLES9).  This ordinarily wouldn't matter, as all good Linux distributions are pretty similar, but in the case of SLES9, the kernel NFS server is used - including all the built-in and stripped down rpc daemons, e.g. the nfs-utils package does not contain rpc.statd. This means that the instructions will differ slightly. The good news is that all the required packages; heartbeat (version 1) and drbd (version 0.7) come as standard.

An additional package that you might find useful is unison. This very handy tool can help you keep your configuration files synchronised between nodes -  a binary RPM for unison can be obtained from a SuSE 9.1 distribution or better.

You should read these notes in conjuction with the ones linked to above.  These are notes and not a HOWTO. If you know all about heartbeat and DRBD just skip to the end.

Get DRBD running!

You should get DRBD running first - read the instructions elsewhere. It is astoundly easy to set-up and get going and while some people do seem to have an awful lot of trouble with it, that usually stems from the fact that they haven't grasped what DRBD is - it is a block device that mirrors between underlying storage on two independent nodes. The block device manifests itself on both nodes but only one node can perform any access (read or write) - the other node merely performs the mirroring to local storage and can, if desired, be failed over to. The underlying physical storage obviously consists of two independent units, each one locally attached to a node. A node can fail or a storage unit can fail and everything will still function normally if configured properly. This is not a shared block device nor is it capable (yet) of running a shared filesystem. It certainly doesn't magically convert a non-shared FS into a shared FS. It isn't a backup mechanism either.

You should not mess about with the underlying storage - only access the drbd block device, and then only on the primary node. So, if you create a filesystem or LVM volume - do so on the drbd block device. Personally, I think it better to use LVM on top of DRBD, that way the full benefits of LVM are exposed to the filesystem and in any case, you can always create addition drbd devices and add them to the volume group. Doing it the other way needs more care and attention. YMMV.

You should link your nodes together with gigabit ethernet - most GbE NICs will handle auto MDI/MDX so you don't need a 'cross-over' cable, NICs are inexpensive and suitably fast.

Don't forget to start drbd automatically on boot-up;

chkconfig drbd on

Avoid Automatic and Unneeded Features!

One feature of SLES9 is that on boot-up, it merrily probes all your hardware and tries to deal with everything it finds - including storage units. This may interfere with your high availability set-up so it is best to switch these off where possible. So, for example, if you want to stop LVM probing the DRBD device(s) you can filter them out in /etc/lvm/lvm.conf or, if you are using LVM in your HA set-up, you can set LVM_VGS_ACTIVATED_ON_BOOT in /etc/sysconfig/lvm to something that isn't a volume group name. This will ensure that all LVM volumes will remain inactive until called explicitly.

Likewise the hotplug features can be noisey, especially on the inactive node - disable this if you don't need it. Indeed, you should disable all services that you don't really need - it'll keep things simple.

Get Heartbeat running!

Again, heartbeat is easy to get going - read the instructions elsewhere. You really should have two independent heartbeat links - typically an RS-232 serial connection (using a null modem cable) and an ethernet connection. Test the connections properly and make sure they work repeatedly post-reboot. I prefer to have dedicated heartbeat links which are independent of those used for drbd communication.

Don't forget to start heartbeat automatically on boot-up, but while testing is it best to switch it on and off manually;

chkconfig heartbeat off

Setting up HA NFS.

Ok., this is the meat for an active/passive HA NFS system. Please note that experience strongly suggests that your HA nodes should not be NFS clients (either of themselves or another server) and this assumption is built-in to the configuration. While it is possible to get it to work, not being NFS clients really keeps things simple and less prone to unexpected failure modes. Those using STONITH (and it is recommended) have less to worry about.

As NFS file handles are a function of the exported device's major and minor numbers you should make sure they are the same on both nodes otherwise you will suffer from stale NFS file handles ( Incidently, that error message is indicative of a number of faults ). If your systems are identically configured then that'll be taken care of naturally.

Clearly, heartbeat should be running the key NFS services so make sure they aren't activated automatically at boot-up by running the following on both nodes;

chkconfig nfsboot off
chkconfig nfsserver off
chkconfig nfslock off

When the system boots up (or is failed-over to) you want to do the following;
  1. make the drbd device primary
  2. mount the drbd device
  3. set the virtual ip address
  4. run nfsboot
  5. run nfsserver
  6. run nfslock

This should be obvious but the order is significant - heartbeart calls these actions in reverse order for a shutdown/migration.

The nfsboot script calls /sbin/sm-notify - this is run once and its job is to signal the NFS clients that the server has rebooted (or equivalently failed-over).  It is important that this runs before the server is started. When called with the stop action it does nothing. You should get sm-notify to bind to the cluster address by changing the options in nfsboot;

OPTIONS=" -q -v"

As the NFS state information should be on the active server you should locate the contents of /var/lib/nfs on the drbd block device that you are exporting - although obviously that directory does not have to be exported. Create a soft-link from /var/lib/nfs to the appropriate directory and do this on both nodes.

Although, the nfslock script is largely redundant in a kernel nfsd system - it serves a useful function when called with the stop action as that sends a kill signal to the lockd kernel daemon - this permits the nfs-server to cleanly stop. The killproc appears to alway return a non-zero result code - it seems to expect the kernel process to die. It is therefore necessary to modify the script so that the stop action always succeeds - you can do this a dozen ways but I just used the existing echo command ;-)

[extract of /etc/init.d/nfslock]

        if [ -n "$RPCLOCKD" ] ; then
            # Swapped killproc and echo so that echo's return code is used ;-)
            killproc -n -KILL lockd
            echo -n "Shutting down NFS file locking daemon "
            rc_status -v

So what does /etc/ha.d/haresources look like ?

node1       drbddisk::ha-disk0  \
            Filesystem::/dev/drbd0::/home::ext3 \
            IPaddr:: \
            nfsboot nfsserver nfslock

That's about it. Be sure to test and test again. Be brutal.

BTW The mountpoint option in /etc/exports is a useful safeguard depending on what you are exporting. I don't know if the kernel NFSd honours the fsid option but it shouldn't be needed even if it looks ideal for the task in hand.

I have kept this example simple, but my actual systems are more complicated and use LVM on top of DRBD with iSCSI storage.
Everything seems to work as of SLES9 SP3 although I have had a few problems after an upgrade including the /var/lib/nfs links being broken. You should always expect that however.