[openstack-dev] [cinder] Resuming of workflows/tasks

Dulko, Michal michal.dulko at intel.com
Tue Feb 24 12:41:51 UTC 2015


Hi all,

I was working on spec[1] and prototype[2] to make Cinder to be able to resume workflows in case of server or service failure. Problem of requests lost and resources left in unresolved states in case of failure was signaled at the Paris Summit[3].

What I was able to prototype was to resume running tasks locally after service restart using persistence API provided by TaskFlow. However core team agreed that we should aim at resuming workflows globally even by other service instances (which I think is a good decision).

There are few major problems blocking this approach:

1. Need of distributed lock to avoid same task being resumed by two instances of a service. Do we need tooz to do that or is there any other solution?
2. Are we going to step out from using TaskFlow? Such idea came up at the mid-cycle meetup, what's the status of it? Without TaskFlow's persistence implementing task resumptions would be a lot more difficult.
3. In case of cinder-api service we're unable to monitor it's state using servicegroup API. Do we have alternatives here to make decision if particular workflow being processed by cinder-api is abandoned?

As this topic is deferred to Liberty release I want to start discussion here to be continued at the summit.

[1] https://review.openstack.org/#/c/147879/
[2] https://review.openstack.org/#/c/152200/
[3] https://etherpad.openstack.org/p/kilo-crossproject-ha-integration



More information about the OpenStack-dev mailing list