[openstack-dev] [Cinder]Behavior when one cinder-volume service is down

Dulko, Michal michal.dulko at intel.com
Tue Sep 15 14:23:17 UTC 2015


> From: Eduard Matei [mailto:eduard.matei at cloudfounders.com]
> Sent: Tuesday, September 15, 2015 4:04 PM
> 
> Hi,
> 
> This all started when we were testing Evacuate with our storage driver.
> We thought we found a bug
> (https://bugs.launchpad.net/cinder/+bug/1491276) then Scott replied that
> we should be running cinder-volume service separate from nova-compute.
> For some internal reasons we can't do that - yet, but we have some
> questions regarding the behavior of the service:
> 
> - on our original test setup we have 3 nodes (1 controller + compute + cinder,
> 2 compute + cinder).
> -- when we shutdown the second node and tried to evacuate, the call was
> routed to cinder-volume of the halted node instead of going to other nodes
> (there were still 2 cinder-volume services up) - WHY?

Cinder assumes that each c-vol can control only volumes which were scheduled onto it. As volume services are differentiated by hostname a known workaround is to set same value for host option in cinder.conf on each of the c-vols. This will make c-vols to listen on the same queue. You may however encounter some race conditions when running such configuration in Active/Active manner. Generally recommended approach is to use Pacemaker and run such c-vols in Active/Passive mode. Also expect that scheduler's decision will be generally ignored - as all the nodes are listening on the same queue.

> - on the new planned setup we will have 6 nodes (3 dedicated controller +
> cinder-volume, 3 compute)
> -- in this case which cinder-volume will manage which volume on which
> compute node?

Same situation - a volume will be controlled by c-vol which created it.

> -- what if: one compute node and one controller go down - will the Evacuate
> still work if one of the cinder-volume services is down? How can we tell - for
> sure - that this setup will work in case ANY 1 controller and 1 compute nodes
> go down?

The best idea is I think to use c-vol + Pacemaker in A/P manner. Pacemaker will make sure that on failure a new c-vol is spun up. Where are volumes physically in case of your driver? Is it like LVM driver (volume lies on the node which is running c-vol) or Ceph (Ceph takes care where volume will land physically, c-vol is just a proxy). 

> 
> Hypothetical:
> - if 3 dedicated controller + cinder-volume nodes work can perform evacuate
> when one of them is down (at the same time with one compute), WHY can't
> the same 3 nodes perform evacuate when compute services is running on
> the same nodes (so 1 cinder is down and 1 compute)

I think I've explained that.

> - if the answer to above question is "They can't " then what is the purpose of
> running 3 cinder-volume services if they can't handle one failure?

Running 3 c-vols is beneficial if you have multiple backends or use LVM driver.

> - and if the answer to above question is "You only run one cinder-volume"
> then how can it handle failure of controller node?

I've explained that too. There are efforts in the community to make it possible to run c-vol in A/A, but I don't think there's ETA yet.

> 
> Thanks,
> 
> Eduard


More information about the OpenStack-dev mailing list