<html><head><style>body{font-family:Helvetica,Arial;font-size:13px}</style></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">Have you tried Ceph?</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;"><br></div> <div id="bloop_sign_1395745558082529024" class="bloop_sign"><span style="font-family:helvetica,arial;font-size:13px"></span>-- <br>Nick Maslov<br><span>Sent with Airmail</span></div> <br><p style="color:#000;">On March 16, 2014 at 11:29:42 PM, George Shuklin (<a href="mailto:george.shuklin@gmail.com">george.shuklin@gmail.com</a>) wrote:</p> <blockquote type="cite" class="clean_bq"><span><div><div></div><div>I think that question is bit out of openstack domain and more in 'nfs  

clusterization'.

<br>

<br>I think you should try to debug HA settings without kvm/openstack, just  

<br>with some IO from application (e.g. fio).  I think at that level problem  

<br>gonna be around mount type (hard/soft). After you get fio survive ha  

<br>switch, add kvm (without openstack) and configure libvirt to force kvm  

<br>to not allow errors of stalled NFS to guest. At that level all mess  

gonna be around timeout settings for virtio devices of qemu.

<br>

<br>On 03/12/2014 07:49 AM, Chris Friesen wrote:

<br>>

<br>> Hi,

<br>>

<br>> I'm looking for advice on setting up HA shared storage for instance  

<br>> virtual disks.

<br>>

<br>> Currently just for starters we're exporting a chunk of the active  

> controller node via nfs and mounting it on the compute nodes.

<br>>

<br>> We have two controllers in active-standby, and when we fail/switch  

<br>> from one to the other it seems to cause the instances to take disk  

<br>> faults and the instance rootfs goes to a read-only state until someone  

> reboots the instance.

<br>>

<br>> We've tried with NFS over UDP and TCP, various retries, etc. Doesn't  

> seem to help.

<br>>

<br>> If we just kill the active controller dead then it seems like the  

<br>> instances retry for some seconds and then take a failure right around  

> the time that the newly-active controller would enable NFS.

<br>>

<br>> Has anyone got any advice about how to handle this?  I'm hoping we  

<br>> just don't have it configured right...I would have expected NFS to be  

> able to deal with this sort of thing.

<br>>

<br>> Thanks,

<br>> Chris

<br>>

<br>> _______________________________________________

<br>> OpenStack-operators mailing list

<br>> OpenStack-operators@lists.openstack.org

<br>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

<br>

<br>

<br>_______________________________________________

<br>OpenStack-operators mailing list

<br>OpenStack-operators@lists.openstack.org

<br>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

<br></div></div></span></blockquote></body></html>