instance filesystem errors due to server failover for instance shared storage...how to handle?

Chris Friesen chris.friesen at windriver.com
Wed Mar 12 05:49:07 UTC 2014


I'm looking for advice on setting up HA shared storage for instance 
virtual disks.

Currently just for starters we're exporting a chunk of the active 
controller node via nfs and mounting it on the compute nodes.

We have two controllers in active-standby, and when we fail/switch from 
one to the other it seems to cause the instances to take disk faults and 
the instance rootfs goes to a read-only state until someone reboots the 

We've tried with NFS over UDP and TCP, various retries, etc.  Doesn't 
seem to help.

If we just kill the active controller dead then it seems like the 
instances retry for some seconds and then take a failure right around 
the time that the newly-active controller would enable NFS.

Has anyone got any advice about how to handle this?  I'm hoping we just 
don't have it configured right...I would have expected NFS to be able to 
deal with this sort of thing.


