[Openstack-operators] instance filesystem errors due to server failover for instance shared storage...how to handle?

George Shuklin george.shuklin at gmail.com
Sun Mar 16 21:26:51 UTC 2014


I think that question is bit out of openstack domain and more in 'nfs 
clusterization'.

I think you should try to debug HA settings without kvm/openstack, just 
with some IO from application (e.g. fio).  I think at that level problem 
gonna be around mount type (hard/soft). After you get fio survive ha 
switch, add kvm (without openstack) and configure libvirt to force kvm 
to not allow errors of stalled NFS to guest. At that level all mess 
gonna be around timeout settings for virtio devices of qemu.

On 03/12/2014 07:49 AM, Chris Friesen wrote:
>
> Hi,
>
> I'm looking for advice on setting up HA shared storage for instance 
> virtual disks.
>
> Currently just for starters we're exporting a chunk of the active 
> controller node via nfs and mounting it on the compute nodes.
>
> We have two controllers in active-standby, and when we fail/switch 
> from one to the other it seems to cause the instances to take disk 
> faults and the instance rootfs goes to a read-only state until someone 
> reboots the instance.
>
> We've tried with NFS over UDP and TCP, various retries, etc. Doesn't 
> seem to help.
>
> If we just kill the active controller dead then it seems like the 
> instances retry for some seconds and then take a failure right around 
> the time that the newly-active controller would enable NFS.
>
> Has anyone got any advice about how to handle this?  I'm hoping we 
> just don't have it configured right...I would have expected NFS to be 
> able to deal with this sort of thing.
>
> Thanks,
> Chris
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators




More information about the OpenStack-operators mailing list