[openstack-dev] [Heat] Using Job Queues for timeout ops
Joshua Harlow
harlowja at outlook.com
Thu Nov 13 21:50:07 UTC 2014
On Nov 13, 2014, at 10:59 AM, Clint Byrum <clint at fewbar.com> wrote:
> Excerpts from Zane Bitter's message of 2014-11-13 09:55:43 -0800:
>> On 13/11/14 09:58, Clint Byrum wrote:
>>> Excerpts from Zane Bitter's message of 2014-11-13 05:54:03 -0800:
>>>> On 13/11/14 03:29, Murugan, Visnusaran wrote:
>>>>> Hi all,
>>>>>
>>>>> Convergence-POC distributes stack operations by sending resource actions
>>>>> over RPC for any heat-engine to execute. Entire stack lifecycle will be
>>>>> controlled by worker/observer notifications. This distributed model has
>>>>> its own advantages and disadvantages.
>>>>>
>>>>> Any stack operation has a timeout and a single engine will be
>>>>> responsible for it. If that engine goes down, timeout is lost along with
>>>>> it. So a traditional way is for other engines to recreate timeout from
>>>>> scratch. Also a missed resource action notification will be detected
>>>>> only when stack operation timeout happens.
>>>>>
>>>>> To overcome this, we will need the following capability:
>>>>>
>>>>> 1.Resource timeout (can be used for retry)
>>>>
>>>> I don't believe this is strictly needed for phase 1 (essentially we
>>>> don't have it now, so nothing gets worse).
>>>>
>>>
>>> We do have a stack timeout, and it stands to reason that we won't have a
>>> single box with a timeout greenthread after this, so a strategy is
>>> needed.
>>
>> Right, that was 2, but I was talking specifically about the resource
>> retry. I think we agree on both points.
>>
>>>> For phase 2, yes, we'll want it. One thing we haven't discussed much is
>>>> that if we used Zaqar for this then the observer could claim a message
>>>> but not acknowledge it until it had processed it, so we could have
>>>> guaranteed delivery.
>>>>
>>>
>>> Frankly, if oslo.messaging doesn't support reliable delivery then we
>>> need to add it.
>>
>> That is straight-up impossible with AMQP. Either you ack the message and
>> risk losing it if the worker dies before processing is complete, or you
>> don't ack the message until it's processed and you become a blocker for
>> every other worker trying to pull jobs off the queue. It works fine when
>> you have only one worker; otherwise not so much. This is the crux of the
>> whole "why isn't Zaqar just Rabbit" debate.
>>
>
> I'm not sure we have the same understanding of AMQP, so hopefully we can
> clarify here. This stackoverflow answer echoes my understanding:
>
> http://stackoverflow.com/questions/17841843/rabbitmq-does-one-consumer-block-the-other-consumers-of-the-same-queue
>
> Not ack'ing just means they might get retransmitted if we never ack. It
> doesn't block other consumers. And as the link above quotes from the
> AMQP spec, when there are multiple consumers, FIFO is not guaranteed.
> Other consumers get other messages.
>
> So just add the ability for a consumer to read, work, ack to
> oslo.messaging, and this is mostly handled via AMQP. Of course that
> also likely means no zeromq for Heat without accepting that messages
> may be lost if workers die.
>
> Basically we need to add something that is not "RPC" but instead
> "jobqueue" that mimics this:
>
> http://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messaging/rpc/dispatcher.py#n131
>
> I've always been suspicious of this bit of code, as it basically means
> that if anything fails between that call, and the one below it, we have
> lost contact, but as long as clients are written to re-send when there
> is a lack of reply, there shouldn't be a problem. But, for a job queue,
> there is no reply, and so the worker would dispatch, and then
> acknowledge after the dispatched call had returned (including having
> completed the step where new messages are added to the queue for any
> newly-possible children).
>
> Just to be clear, I believe what Zaqar adds is the ability to peek at
> a specific message ID and not affect it in the queue, which is entirely
> different than ACK'ing the ones you've already received in your session.
>
>> Most stuff in OpenStack gets around this by doing synchronous calls
>> across oslo.messaging, where there is an end-to-end ack. We don't want
>> that here though. We'll probably have to make do with having ways to
>> recover after a failure (kick off another update with the same data is
>> always an option). The hard part is that if something dies we don't
>> really want to wait until the stack timeout to start recovering.
>>
>
> I fully agree. Josh's point about using a coordination service like
> Zookeeper to maintain liveness is an interesting one here. If we just
> make sure that all the workers that have claimed work off the queue are
> alive, that should be sufficient to prevent a hanging stack situation
> like you describe above.
>
>>> Zaqar should have nothing to do with this and is, IMO, a
>>> poor choice at this stage, though I like the idea of using it in the
>>> future so that we can make Heat more of an outside-the-cloud app.
>>
>> I'm inclined to agree that it would be hard to force operators to deploy
>> Zaqar in order to be able to deploy Heat, and that we should probably be
>> cautious for that reason.
>>
>> That said, from a purely technical point of view it's not a poor choice
>> at all - it has *exactly* the semantics we want (unlike AMQP), and at
>> least to the extent that the operator wants to offer Zaqar to users
>> anyway it completely eliminates a whole backend that they would
>> otherwise have to deploy. It's a tragedy that all of OpenStack has not
>> been designed to build upon itself in this way and it causes me physical
>> pain to know that we're about to perpetuate it.
>>
>>>>> 2.Recover from engine failure (loss of stack timeout, resource action
>>>>> notification)
>>>>>
>>>>> Suggestion:
>>>>>
>>>>> 1.Use task queue like celery to host timeouts for both stack and resource.
>>>>
>>>> I believe Celery is more or less a non-starter as an OpenStack
>>>> dependency because it uses Kombu directly to talk to the queue, vs.
>>>> oslo.messaging which is an abstraction layer over Kombu, Qpid, ZeroMQ
>>>> and maybe others in the future. i.e. requiring Celery means that some
>>>> users would be forced to install Rabbit for the first time.
>>>>
>>>> One option would be to fork Celery and replace Kombu with oslo.messaging
>>>> as its abstraction layer. Good luck getting that maintained though,
>>>> since Celery _invented_ Kombu to be it's abstraction layer.
>>>>
>>>
>>> A slight side point here: Kombu supports Qpid and ZeroMQ. Oslo.messaging
>>
>> You're right about Kombu supporting Qpid, it appears they added it. I
>> don't see ZeroMQ on the list though:
>>
>> http://kombu.readthedocs.org/en/latest/userguide/connections.html#transport-comparison
>>
>
> They, confusingly, call it zmq, and it may not be in a recent release:
>
> https://github.com/celery/kombu/blob/master/kombu/transport/zmq.py
>
>>> is more about having a unified API than a set of magic backends. It
>>> actually boggles my mind why we didn't just use kombu (cue 20 reactions
>>> with people saying it wasn't EXACTLY right), but I think we're committed
>>
>> Well, we also have to take into account the fact that Qpid support was
>> added only during the last 9 months, whereas oslo.messaging was
>> implemented 3 years ago and time travel hasn't been invented yet (for
>> any definition of 'yet').
>>
>
> Go back in time 3 years ago, and perhaps we could have done all the work
> we've done in kombu. Hindsight though.
+1 to this, I've seen the openstack community shy away from helping/improving other open source projects, which saddens me.
Kombu I think is in this category, but the future is unwritten and there is still hope!
>
>>> to oslo.messaging now. Anyway, celery would need no such refactor, as
>>> kombu would be able to access the same bus as everything else just fine.
>>
>> Interesting, so that would make it easier to get Celery added to the
>> global requirements, although we'd likely still have headaches to deal
>> with around configuration.
>>
>
> Yeah, I'm not advocating for celery, just pointing out that it has
> become more like what we already deploy. :)
>
>>>>> 2.Poll database for engine failures and restart timers/ retrigger
>>>>> resource retry (IMHO: This would be a traditional and weighs heavy)
>>>>>
>>>>> 3.Migrate heat to use TaskFlow. (Too many code change)
>>>>
>>>> If it's just handling timed triggers (maybe this is closer to #2) and
>>>> not migrating the whole code base, then I don't see why it would be a
>>>> big change (or even a change at all - it's basically new functionality).
>>>> I'm not sure if TaskFlow has something like this already. If not we
>>>> could also look at what Mistral is doing with timed tasks and see if we
>>>> could spin some of it out into an Oslo library.
>>>>
>>>
>>> I feel like it boils down to something running periodically checking for
>>> scheduled tasks that are due to run but have not run yet. I wonder if we
>>> can actually look at Ironic for how they do this, because Ironic polls
>>> power state of machines constantly, and uses a hash ring to make sure
>>> only one conductor is polling any one machine at a time. If we broke
>>> stacks up into a hash ring like that for the purpose of singleton tasks
>>> like timeout checking, that might work out nicely.
>>
>> +1 for something like this, and +2 if we can get it from a library we
>> don't have to write ourselves (whether it be TaskFlow or something spun
>> out of Mistral or Ironic into Oslo).
>>
>
> Right, those things are fairly generic and would definitely fit nicely
> in a library.
>
> So, the simplest possible solution, I think, is to lock resource id +
> graph version. Since we are scared of Zookeeper, we'll need a periodic
> job in the engines that looks for stale locks, or we have to wait for
> another stack operation to check for them.
Maybe it's time we face our fears, have people even tried zookeeper?
Honestly I start to wonder, because it has some really neat features if people just try it out...
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list