[Openstack-operators] Cinde block storage HA

Juan José Pavlik Salles jjpavlik at gmail.com
Tue Sep 16 18:08:31 UTC 2014


Hi guys, I'm trying to put some HA on our cinder service, we have the next
scenario:

-Real backends: EMC clarion (SATA drives) and HP Storevirtual P4000 (SAS
drives), this two backends export 2 big LUNs to our (one and only right
now) cinder server.
-Once these big LUNs are imported in the cinder server, two different VG
are created for two different cinder LVM drivers (cinder-volumes-1 and
cinder-volumes-2). This way I have two different storage resources to give
to my tenants.

What I want is to deploy a second cinder server to act as failover of the
first one. Both servers are identical. So far I'm running a few tests with
isolated VMs.

-I installed corosync+pacemaker in 2 VMs, added a Virtual IP.
-Imported in the VMs a LUN with iSCSI created a VG
-Exported a LV with tgt. More or less the same scenario we have on
production.

If one of the VMs die the second one picks the virtual IP throughtout tgt
is exporting the LUN and the iSCSI session doesn't die, here you can see
part of the logs where the LUN is being imported:

Sep 16 14:29:50 borrar-nfs kernel: [86630.416160]  connection1:0: ping
timeout of 5 secs expired, recv timeout 5, last rx 4316547395, last ping
4316548646, now 4316549900
Sep 16 14:29:50 borrar-nfs kernel: [86630.418938]  connection1:0: detected
conn error (1011)
Sep 16 14:29:51 borrar-nfs iscsid: Kernel reported iSCSI connection 1:0
error (1011) state (3)
Sep 16 14:29:53 borrar-nfs iscsid: connection1:0 is operational after
recovery (1 attempts)

This test was really simple, just one 1GB LUN but it worked ok, even when
the failover was tested during a writing operation.

So it seems to be a good-so-far-solution, but there are a few things that
worries me a bit:

-Timeouts? How much time do I have to detect the problem and move the IP to
the new node before the iscsi connections die. I think I could play a
little bit with timeo.noop_out_timeout in iscsid.conf
-What if there was a write operation going on while a node failed, what if
this operation never reached the real backends, could I come across some
inconsistencies in the volume FS? Any recommendations?
-If I create a volume in cinder, the proper target file is created
in /var/lib/cinder/volumes/volue-* but, I need the file to be created in
both cinder nodes in case one of them fail. What would be a proper solution
for this? shared storage for the directory? SVN?
-Both servers should be running tgt at the same time or maybe I should
start tgt on the failover server once the virtual IP is changed?

Any comments or suggestions will be more than appreciated. Thanks!

-- 
Pavlik Salles Juan José
Blog - http://viviendolared.blogspot.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20140916/c3d885ea/attachment.html>


More information about the OpenStack-operators mailing list