[Openstack-operators] Cinde block storage HA
Abel Lopez
alopgeek at gmail.com
Tue Sep 16 18:20:09 UTC 2014
Have you tried using the native Emc drivers? That way cinder only acts as a
broker between your instances and the storage back end, and you don't need
to worry about your cinder-volume service being HA. (As much)
On Tuesday, September 16, 2014, Juan José Pavlik Salles <jjpavlik at gmail.com>
wrote:
> Hi guys, I'm trying to put some HA on our cinder service, we have the next
> scenario:
>
> -Real backends: EMC clarion (SATA drives) and HP Storevirtual P4000 (SAS
> drives), this two backends export 2 big LUNs to our (one and only right
> now) cinder server.
> -Once these big LUNs are imported in the cinder server, two different VG
> are created for two different cinder LVM drivers (cinder-volumes-1 and
> cinder-volumes-2). This way I have two different storage resources to give
> to my tenants.
>
> What I want is to deploy a second cinder server to act as failover of the
> first one. Both servers are identical. So far I'm running a few tests with
> isolated VMs.
>
> -I installed corosync+pacemaker in 2 VMs, added a Virtual IP.
> -Imported in the VMs a LUN with iSCSI created a VG
> -Exported a LV with tgt. More or less the same scenario we have on
> production.
>
> If one of the VMs die the second one picks the virtual IP throughtout tgt
> is exporting the LUN and the iSCSI session doesn't die, here you can see
> part of the logs where the LUN is being imported:
>
> Sep 16 14:29:50 borrar-nfs kernel: [86630.416160] connection1:0: ping
> timeout of 5 secs expired, recv timeout 5, last rx 4316547395, last ping
> 4316548646, now 4316549900
> Sep 16 14:29:50 borrar-nfs kernel: [86630.418938] connection1:0: detected
> conn error (1011)
> Sep 16 14:29:51 borrar-nfs iscsid: Kernel reported iSCSI connection 1:0
> error (1011) state (3)
> Sep 16 14:29:53 borrar-nfs iscsid: connection1:0 is operational after
> recovery (1 attempts)
>
> This test was really simple, just one 1GB LUN but it worked ok, even when
> the failover was tested during a writing operation.
>
> So it seems to be a good-so-far-solution, but there are a few things that
> worries me a bit:
>
> -Timeouts? How much time do I have to detect the problem and move the IP
> to the new node before the iscsi connections die. I think I could play a
> little bit with timeo.noop_out_timeout in iscsid.conf
> -What if there was a write operation going on while a node failed, what if
> this operation never reached the real backends, could I come across some
> inconsistencies in the volume FS? Any recommendations?
> -If I create a volume in cinder, the proper target file is created
> in /var/lib/cinder/volumes/volue-* but, I need the file to be created in
> both cinder nodes in case one of them fail. What would be a proper solution
> for this? shared storage for the directory? SVN?
> -Both servers should be running tgt at the same time or maybe I should
> start tgt on the failover server once the virtual IP is changed?
>
> Any comments or suggestions will be more than appreciated. Thanks!
>
> --
> Pavlik Salles Juan José
> Blog - http://viviendolared.blogspot.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20140916/ffd561f8/attachment.html>
More information about the OpenStack-operators
mailing list