[openstack-dev] [Heat] Using Job Queues for timeout ops

Joshua Harlow harlowja at outlook.com
Thu Nov 13 21:50:07 UTC 2014

On Nov 13, 2014, at 10:59 AM, Clint Byrum <clint at fewbar.com> wrote:

> Excerpts from Zane Bitter's message of 2014-11-13 09:55:43 -0800:
>> On 13/11/14 09:58, Clint Byrum wrote:
>>> Excerpts from Zane Bitter's message of 2014-11-13 05:54:03 -0800:
>>>> On 13/11/14 03:29, Murugan, Visnusaran wrote:
>>>>> Hi all,
>>>>> Convergence-POC distributes stack operations by sending resource actions
>>>>> over RPC for any heat-engine to execute. Entire stack lifecycle will be
>>>>> controlled by worker/observer notifications. This distributed model has
>>>>> its own advantages and disadvantages.
>>>>> Any stack operation has a timeout and a single engine will be
>>>>> responsible for it. If that engine goes down, timeout is lost along with
>>>>> it. So a traditional way is for other engines to recreate timeout from
>>>>> scratch. Also a missed resource action notification will be detected
>>>>> only when stack operation timeout happens.
>>>>> To overcome this, we will need the following capability:
>>>>> 1.Resource timeout (can be used for retry)
>>>> I don't believe this is strictly needed for phase 1 (essentially we
>>>> don't have it now, so nothing gets worse).
>>> We do have a stack timeout, and it stands to reason that we won't have a
>>> single box with a timeout greenthread after this, so a strategy is
>>> needed.
>> Right, that was 2, but I was talking specifically about the resource 
>> retry. I think we agree on both points.
>>>> For phase 2, yes, we'll want it. One thing we haven't discussed much is
>>>> that if we used Zaqar for this then the observer could claim a message
>>>> but not acknowledge it until it had processed it, so we could have
>>>> guaranteed delivery.
>>> Frankly, if oslo.messaging doesn't support reliable delivery then we
>>> need to add it.
>> That is straight-up impossible with AMQP. Either you ack the message and
>> risk losing it if the worker dies before processing is complete, or you 
>> don't ack the message until it's processed and you become a blocker for 
>> every other worker trying to pull jobs off the queue. It works fine when 
>> you have only one worker; otherwise not so much. This is the crux of the 
>> whole "why isn't Zaqar just Rabbit" debate.
> I'm not sure we have the same understanding of AMQP, so hopefully we can
> clarify here. This stackoverflow answer echoes my understanding:
> http://stackoverflow.com/questions/17841843/rabbitmq-does-one-consumer-block-the-other-consumers-of-the-same-queue
> Not ack'ing just means they might get retransmitted if we never ack. It
> doesn't block other consumers. And as the link above quotes from the
> AMQP spec, when there are multiple consumers, FIFO is not guaranteed.
> Other consumers get other messages.
> So just add the ability for a consumer to read, work, ack to
> oslo.messaging, and this is mostly handled via AMQP. Of course that
> also likely means no zeromq for Heat without accepting that messages
> may be lost if workers die.
> Basically we need to add something that is not "RPC" but instead
> "jobqueue" that mimics this:
> http://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messaging/rpc/dispatcher.py#n131
> I've always been suspicious of this bit of code, as it basically means
> that if anything fails between that call, and the one below it, we have
> lost contact, but as long as clients are written to re-send when there
> is a lack of reply, there shouldn't be a problem. But, for a job queue,
> there is no reply, and so the worker would dispatch, and then
> acknowledge after the dispatched call had returned (including having
> completed the step where new messages are added to the queue for any
> newly-possible children).
> Just to be clear, I believe what Zaqar adds is the ability to peek at
> a specific message ID and not affect it in the queue, which is entirely
> different than ACK'ing the ones you've already received in your session.
>> Most stuff in OpenStack gets around this by doing synchronous calls 
>> across oslo.messaging, where there is an end-to-end ack. We don't want 
>> that here though. We'll probably have to make do with having ways to 
>> recover after a failure (kick off another update with the same data is 
>> always an option). The hard part is that if something dies we don't 
>> really want to wait until the stack timeout to start recovering.
> I fully agree. Josh's point about using a coordination service like
> Zookeeper to maintain liveness is an interesting one here. If we just
> make sure that all the workers that have claimed work off the queue are
> alive, that should be sufficient to prevent a hanging stack situation
> like you describe above.
>>> Zaqar should have nothing to do with this and is, IMO, a
>>> poor choice at this stage, though I like the idea of using it in the
>>> future so that we can make Heat more of an outside-the-cloud app.
>> I'm inclined to agree that it would be hard to force operators to deploy 
>> Zaqar in order to be able to deploy Heat, and that we should probably be 
>> cautious for that reason.
>> That said, from a purely technical point of view it's not a poor choice 
>> at all - it has *exactly* the semantics we want (unlike AMQP), and at 
>> least to the extent that the operator wants to offer Zaqar to users 
>> anyway it completely eliminates a whole backend that they would 
>> otherwise have to deploy. It's a tragedy that all of OpenStack has not 
>> been designed to build upon itself in this way and it causes me physical 
>> pain to know that we're about to perpetuate it.
>>>>> 2.Recover from engine failure (loss of stack timeout, resource action
>>>>> notification)
>>>>> Suggestion:
>>>>> 1.Use task queue like celery to host timeouts for both stack and resource.
>>>> I believe Celery is more or less a non-starter as an OpenStack
>>>> dependency because it uses Kombu directly to talk to the queue, vs.
>>>> oslo.messaging which is an abstraction layer over Kombu, Qpid, ZeroMQ
>>>> and maybe others in the future. i.e. requiring Celery means that some
>>>> users would be forced to install Rabbit for the first time.
>>>> One option would be to fork Celery and replace Kombu with oslo.messaging
>>>> as its abstraction layer. Good luck getting that maintained though,
>>>> since Celery _invented_ Kombu to be it's abstraction layer.
>>> A slight side point here: Kombu supports Qpid and ZeroMQ. Oslo.messaging
>> You're right about Kombu supporting Qpid, it appears they added it. I 
>> don't see ZeroMQ on the list though:
>> http://kombu.readthedocs.org/en/latest/userguide/connections.html#transport-comparison
> They, confusingly, call it zmq, and it may not be in a recent release:
> https://github.com/celery/kombu/blob/master/kombu/transport/zmq.py
>>> is more about having a unified API than a set of magic backends. It
>>> actually boggles my mind why we didn't just use kombu (cue 20 reactions
>>> with people saying it wasn't EXACTLY right), but I think we're committed
>> Well, we also have to take into account the fact that Qpid support was 
>> added only during the last 9 months, whereas oslo.messaging was 
>> implemented 3 years ago and time travel hasn't been invented yet (for 
>> any definition of 'yet').
> Go back in time 3 years ago, and perhaps we could have done all the work
> we've done in kombu. Hindsight though.

+1 to this, I've seen the openstack community shy away from helping/improving other open source projects, which saddens me.

Kombu I think is in this category, but the future is unwritten and there is still hope!

>>> to oslo.messaging now. Anyway, celery would need no such refactor, as
>>> kombu would be able to access the same bus as everything else just fine.
>> Interesting, so that would make it easier to get Celery added to the 
>> global requirements, although we'd likely still have headaches to deal 
>> with around configuration.
> Yeah, I'm not advocating for celery, just pointing out that it has
> become more like what we already deploy. :)
>>>>> 2.Poll database for engine failures and restart timers/ retrigger
>>>>> resource retry (IMHO: This would be a traditional and weighs heavy)
>>>>> 3.Migrate heat to use TaskFlow. (Too many code change)
>>>> If it's just handling timed triggers (maybe this is closer to #2) and
>>>> not migrating the whole code base, then I don't see why it would be a
>>>> big change (or even a change at all - it's basically new functionality).
>>>> I'm not sure if TaskFlow has something like this already. If not we
>>>> could also look at what Mistral is doing with timed tasks and see if we
>>>> could spin some of it out into an Oslo library.
>>> I feel like it boils down to something running periodically checking for
>>> scheduled tasks that are due to run but have not run yet. I wonder if we
>>> can actually look at Ironic for how they do this, because Ironic polls
>>> power state of machines constantly, and uses a hash ring to make sure
>>> only one conductor is polling any one machine at a time. If we broke
>>> stacks up into a hash ring like that for the purpose of singleton tasks
>>> like timeout checking, that might work out nicely.
>> +1 for something like this, and +2 if we can get it from a library we 
>> don't have to write ourselves (whether it be TaskFlow or something spun 
>> out of Mistral or Ironic into Oslo).
> Right, those things are fairly generic and would definitely fit nicely
> in a library.
> So, the simplest possible solution, I think, is to lock resource id +
> graph version. Since we are scared of Zookeeper, we'll need a periodic
> job in the engines that looks for stale locks, or we have to wait for
> another stack operation to check for them.

Maybe it's time we face our fears, have people even tried zookeeper?

Honestly I start to wonder, because it has some really neat features if people just try it out...

> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

More information about the OpenStack-dev mailing list