Notes on HA NFS using SLES9
While there are a number of descriptions of how to achieve
high-availabilty NFS using Linux with the heartbeat and DRBD packages, e.g.
there are currently none for Novell's SuSE Linux Enterprise Server
(SLES9). This ordinarily wouldn't matter, as all good Linux distributions are pretty similar,
but in the case of SLES9, the kernel NFS server is used - including all the
built-in and stripped down rpc daemons, e.g. the nfs-utils
package does not contain rpc.statd
. This means that the
instructions will differ slightly. The good news is
that all the required packages; heartbeat
(version 1) and
drbd
(version 0.7) come as standard.
An additional package that you might find useful is unison. This
very handy tool can help you
keep your
configuration files synchronised between nodes - a binary RPM for
unison can be obtained from a SuSE 9.1 distribution or better.
You should read these notes in
conjuction with the ones linked to above. These are notes and not
a HOWTO. If you know all about heartbeat and DRBD just skip to the end.
Get DRBD running!
You should get DRBD running first - read the instructions elsewhere. It
is astoundly easy to set-up and get going and while some people do seem
to have an awful lot of trouble with it, that usually stems from the
fact that they haven't grasped what DRBD is - it is a block device that mirrors between
underlying storage on two independent
nodes. The block device manifests itself on both nodes but only one
node can perform any access (read or write) - the other node merely
performs the mirroring to local storage and can, if desired, be failed over to.
The underlying physical storage obviously
consists of two independent units, each one locally attached
to a node. A node can fail or a storage unit can fail and everything
will still function normally if configured properly. This is not a
shared block device nor is it capable (yet) of running a shared
filesystem. It certainly doesn't magically convert a non-shared FS into
a shared FS. It isn't a backup mechanism either.
You should not mess about with the underlying storage - only access the
drbd block device, and then only on the primary node. So, if you create
a filesystem or LVM volume - do so on the drbd block device. Personally,
I think it better to use LVM on top of DRBD, that way the full benefits of
LVM are exposed to the filesystem and in any case, you can always create
addition drbd devices and add them to the volume group. Doing it the
other way needs more care and attention. YMMV.
You should link your nodes together with gigabit ethernet - most GbE
NICs will handle auto MDI/MDX so you don't need a 'cross-over' cable,
NICs are inexpensive and suitably fast.
Don't forget to start drbd automatically on boot-up;
chkconfig drbd on
Avoid Automatic and Unneeded Features!
One feature of SLES9 is that on boot-up, it merrily probes all your
hardware and tries to deal with everything it finds - including storage
units. This may interfere with your high availability set-up so it is
best to switch these off
where possible. So, for example, if you want to stop LVM probing the
DRBD device(s) you can filter them out in /etc/lvm/lvm.conf
or, if you
are using LVM in your HA set-up, you can set LVM_VGS_ACTIVATED_ON_BOOT
in /etc/sysconfig/lvm
to something that isn't a volume
group name. This will ensure that all LVM volumes
will remain
inactive until called explicitly.
Likewise the hotplug features can be noisey, especially on the inactive
node - disable this if you don't need it. Indeed, you should disable
all services that you don't really need - it'll keep things simple.
Get Heartbeat running!
Again, heartbeat is easy to get going - read the instructions
elsewhere. You really should have two independent heartbeat links -
typically an RS-232 serial connection (using a null modem cable) and an
ethernet connection. Test the connections properly and make sure they
work repeatedly post-reboot. I prefer to have dedicated heartbeat links
which are independent of those used for drbd communication.
Don't forget to start heartbeat automatically on boot-up, but while
testing is it best to switch it on and off manually;
chkconfig heartbeat off
Setting up HA NFS.
Ok., this is the meat for an active/passive HA NFS system. Please note
that experience strongly suggests that your HA nodes should not be NFS clients (either of
themselves or another server) and this assumption is built-in to the
configuration. While it is possible to get it to work, not being NFS
clients really keeps things simple and less prone to unexpected failure
modes. Those using STONITH (and it is recommended) have less to worry about.
As NFS file handles are a function of the exported device's major and minor
numbers you should make sure they are the same on both nodes otherwise
you will suffer from stale NFS file handles ( Incidently, that error
message is indicative of a number of faults ). If your systems
are identically configured then that'll be taken care of naturally.
Clearly, heartbeat should be running the key NFS services so make sure they
aren't activated automatically at boot-up by
running the following on both nodes;
chkconfig nfsboot off
chkconfig nfsserver off
chkconfig nfslock off
When the system boots up (or is failed-over to) you want to do the
following;
- make the drbd device primary
- mount the drbd device
- set the virtual ip address
- run nfsboot
- run nfsserver
- run nfslock
This should be obvious but the order is significant - heartbeart calls
these actions in reverse order for a shutdown/migration.
The nfsboot script calls /sbin/sm-notify
- this is run
once and its job is to signal the NFS clients that the server has
rebooted (or equivalently failed-over). It is important that this
runs before the server is started. When
called with the stop action it does nothing. You
should get sm-notify
to bind to the cluster address by changing the options in nfsboot
;
OPTIONS=" -q -v xxx.xxx.xxx.xxx"
As the NFS state information should be on the active server you should
locate the contents of /var/lib/nfs
on the drbd block device
that you are exporting - although obviously that directory does not
have to be exported. Create a soft-link from /var/lib/nfs
to
the appropriate directory and do this on both nodes.
Although, the nfslock
script is largely redundant in a kernel
nfsd system - it serves a useful function when called with the stop action
as that sends a kill signal to the lockd kernel daemon -
this permits the nfs-server to cleanly stop.
The killproc appears to alway return a non-zero result code
- it seems to expect the kernel process to die. It is therefore
necessary to modify the script so that the stop action always
succeeds - you can do this a dozen ways but I just used the existing echo
command ;-)
[extract of /etc/init.d/nfslock
]
;;
stop)
if [ -n "$RPCLOCKD" ] ; then
# Swapped killproc and echo so that echo's return code is used ;-)
killproc -n -KILL lockd
echo -n "Shutting down NFS file locking daemon "
rc_status -v
fi
;;
So what does /etc/ha.d/haresources
look like ?
node1 drbddisk::ha-disk0 \
Filesystem::/dev/drbd0::/home::ext3 \
IPaddr::192.168.1.100/24 \
nfsboot nfsserver nfslock
That's about it. Be sure to test and test again. Be brutal.
BTW The mountpoint
option in /etc/exports
is a
useful safeguard depending on what you are exporting.
I don't know if the kernel NFSd honours the fsid
option
but it shouldn't be needed even if it looks ideal for the task in hand.
I have kept this example simple, but my actual systems are more
complicated and use LVM on top of DRBD with iSCSI storage.
Everything seems to work as of SLES9 SP3 although I have had
a few problems after an upgrade including the
/var/lib/nfs
links being broken. You should
always expect that however.