[openstack-dev] [Heat] Using Job Queues for timeout ops
Jastrzebski, Michal
michal.jastrzebski at intel.com
Fri Nov 14 06:58:41 UTC 2014
> -----Original Message-----
> From: Joshua Harlow [mailto:harlowja at outlook.com]
> Sent: Thursday, November 13, 2014 10:50 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [Heat] Using Job Queues for timeout ops
>
> On Nov 13, 2014, at 10:59 AM, Clint Byrum <clint at fewbar.com> wrote:
>
> > Excerpts from Zane Bitter's message of 2014-11-13 09:55:43 -0800:
> >> On 13/11/14 09:58, Clint Byrum wrote:
> >>> Excerpts from Zane Bitter's message of 2014-11-13 05:54:03 -0800:
> >>>> On 13/11/14 03:29, Murugan, Visnusaran wrote:
> >>>>> Hi all,
> >>>>>
> >>>>> Convergence-POC distributes stack operations by sending resource
> >>>>> actions over RPC for any heat-engine to execute. Entire stack
> >>>>> lifecycle will be controlled by worker/observer notifications.
> >>>>> This distributed model has its own advantages and disadvantages.
> >>>>>
> >>>>> Any stack operation has a timeout and a single engine will be
> >>>>> responsible for it. If that engine goes down, timeout is lost
> >>>>> along with it. So a traditional way is for other engines to
> >>>>> recreate timeout from scratch. Also a missed resource action
> >>>>> notification will be detected only when stack operation timeout
> happens.
> >>>>>
> >>>>> To overcome this, we will need the following capability:
> >>>>>
> >>>>> 1.Resource timeout (can be used for retry)
> >>>>
> >>>> I don't believe this is strictly needed for phase 1 (essentially we
> >>>> don't have it now, so nothing gets worse).
> >>>>
> >>>
> >>> We do have a stack timeout, and it stands to reason that we won't
> >>> have a single box with a timeout greenthread after this, so a
> >>> strategy is needed.
> >>
> >> Right, that was 2, but I was talking specifically about the resource
> >> retry. I think we agree on both points.
> >>
> >>>> For phase 2, yes, we'll want it. One thing we haven't discussed
> >>>> much is that if we used Zaqar for this then the observer could
> >>>> claim a message but not acknowledge it until it had processed it,
> >>>> so we could have guaranteed delivery.
> >>>>
> >>>
> >>> Frankly, if oslo.messaging doesn't support reliable delivery then we
> >>> need to add it.
> >>
> >> That is straight-up impossible with AMQP. Either you ack the message
> >> and risk losing it if the worker dies before processing is complete,
> >> or you don't ack the message until it's processed and you become a
> >> blocker for every other worker trying to pull jobs off the queue. It
> >> works fine when you have only one worker; otherwise not so much. This
> >> is the crux of the whole "why isn't Zaqar just Rabbit" debate.
> >>
> >
> > I'm not sure we have the same understanding of AMQP, so hopefully we
> > can clarify here. This stackoverflow answer echoes my understanding:
> >
> > http://stackoverflow.com/questions/17841843/rabbitmq-does-one-
> consumer
> > -block-the-other-consumers-of-the-same-queue
> >
> > Not ack'ing just means they might get retransmitted if we never ack.
> > It doesn't block other consumers. And as the link above quotes from
> > the AMQP spec, when there are multiple consumers, FIFO is not
> guaranteed.
> > Other consumers get other messages.
> >
> > So just add the ability for a consumer to read, work, ack to
> > oslo.messaging, and this is mostly handled via AMQP. Of course that
> > also likely means no zeromq for Heat without accepting that messages
> > may be lost if workers die.
> >
> > Basically we need to add something that is not "RPC" but instead
> > "jobqueue" that mimics this:
> >
> > http://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messa
> > ging/rpc/dispatcher.py#n131
> >
> > I've always been suspicious of this bit of code, as it basically means
> > that if anything fails between that call, and the one below it, we
> > have lost contact, but as long as clients are written to re-send when
> > there is a lack of reply, there shouldn't be a problem. But, for a job
> > queue, there is no reply, and so the worker would dispatch, and then
> > acknowledge after the dispatched call had returned (including having
> > completed the step where new messages are added to the queue for any
> > newly-possible children).
> >
> > Just to be clear, I believe what Zaqar adds is the ability to peek at
> > a specific message ID and not affect it in the queue, which is
> > entirely different than ACK'ing the ones you've already received in your
> session.
> >
> >> Most stuff in OpenStack gets around this by doing synchronous calls
> >> across oslo.messaging, where there is an end-to-end ack. We don't
> >> want that here though. We'll probably have to make do with having
> >> ways to recover after a failure (kick off another update with the
> >> same data is always an option). The hard part is that if something
> >> dies we don't really want to wait until the stack timeout to start
> recovering.
> >>
> >
> > I fully agree. Josh's point about using a coordination service like
> > Zookeeper to maintain liveness is an interesting one here. If we just
> > make sure that all the workers that have claimed work off the queue
> > are alive, that should be sufficient to prevent a hanging stack
> > situation like you describe above.
> >
> >>> Zaqar should have nothing to do with this and is, IMO, a poor choice
> >>> at this stage, though I like the idea of using it in the future so
> >>> that we can make Heat more of an outside-the-cloud app.
> >>
> >> I'm inclined to agree that it would be hard to force operators to
> >> deploy Zaqar in order to be able to deploy Heat, and that we should
> >> probably be cautious for that reason.
> >>
> >> That said, from a purely technical point of view it's not a poor
> >> choice at all - it has *exactly* the semantics we want (unlike AMQP),
> >> and at least to the extent that the operator wants to offer Zaqar to
> >> users anyway it completely eliminates a whole backend that they would
> >> otherwise have to deploy. It's a tragedy that all of OpenStack has
> >> not been designed to build upon itself in this way and it causes me
> >> physical pain to know that we're about to perpetuate it.
> >>
> >>>>> 2.Recover from engine failure (loss of stack timeout, resource
> >>>>> action
> >>>>> notification)
> >>>>>
> >>>>> Suggestion:
> >>>>>
> >>>>> 1.Use task queue like celery to host timeouts for both stack and
> resource.
> >>>>
> >>>> I believe Celery is more or less a non-starter as an OpenStack
> >>>> dependency because it uses Kombu directly to talk to the queue, vs.
> >>>> oslo.messaging which is an abstraction layer over Kombu, Qpid,
> >>>> ZeroMQ and maybe others in the future. i.e. requiring Celery means
> >>>> that some users would be forced to install Rabbit for the first time.
> >>>>
> >>>> One option would be to fork Celery and replace Kombu with
> >>>> oslo.messaging as its abstraction layer. Good luck getting that
> >>>> maintained though, since Celery _invented_ Kombu to be it's
> abstraction layer.
> >>>>
> >>>
> >>> A slight side point here: Kombu supports Qpid and ZeroMQ.
> >>> Oslo.messaging
> >>
> >> You're right about Kombu supporting Qpid, it appears they added it. I
> >> don't see ZeroMQ on the list though:
> >>
> >>
> http://kombu.readthedocs.org/en/latest/userguide/connections.html#tra
> >> nsport-comparison
> >>
> >
> > They, confusingly, call it zmq, and it may not be in a recent release:
> >
> > https://github.com/celery/kombu/blob/master/kombu/transport/zmq.py
> >
> >>> is more about having a unified API than a set of magic backends. It
> >>> actually boggles my mind why we didn't just use kombu (cue 20
> >>> reactions with people saying it wasn't EXACTLY right), but I think
> >>> we're committed
> >>
> >> Well, we also have to take into account the fact that Qpid support
> >> was added only during the last 9 months, whereas oslo.messaging was
> >> implemented 3 years ago and time travel hasn't been invented yet (for
> >> any definition of 'yet').
> >>
> >
> > Go back in time 3 years ago, and perhaps we could have done all the
> > work we've done in kombu. Hindsight though.
>
> +1 to this, I've seen the openstack community shy away from
> helping/improving other open source projects, which saddens me.
>
> Kombu I think is in this category, but the future is unwritten and there is still
> hope!
>
> >
> >>> to oslo.messaging now. Anyway, celery would need no such refactor,
> >>> as kombu would be able to access the same bus as everything else just
> fine.
> >>
> >> Interesting, so that would make it easier to get Celery added to the
> >> global requirements, although we'd likely still have headaches to
> >> deal with around configuration.
> >>
> >
> > Yeah, I'm not advocating for celery, just pointing out that it has
> > become more like what we already deploy. :)
> >
> >>>>> 2.Poll database for engine failures and restart timers/ retrigger
> >>>>> resource retry (IMHO: This would be a traditional and weighs
> >>>>> heavy)
> >>>>>
> >>>>> 3.Migrate heat to use TaskFlow. (Too many code change)
> >>>>
> >>>> If it's just handling timed triggers (maybe this is closer to #2)
> >>>> and not migrating the whole code base, then I don't see why it
> >>>> would be a big change (or even a change at all - it's basically new
> functionality).
> >>>> I'm not sure if TaskFlow has something like this already. If not we
> >>>> could also look at what Mistral is doing with timed tasks and see
> >>>> if we could spin some of it out into an Oslo library.
> >>>>
> >>>
> >>> I feel like it boils down to something running periodically checking
> >>> for scheduled tasks that are due to run but have not run yet. I
> >>> wonder if we can actually look at Ironic for how they do this,
> >>> because Ironic polls power state of machines constantly, and uses a
> >>> hash ring to make sure only one conductor is polling any one machine
> >>> at a time. If we broke stacks up into a hash ring like that for the
> >>> purpose of singleton tasks like timeout checking, that might work out
> nicely.
> >>
> >> +1 for something like this, and +2 if we can get it from a library we
> >> don't have to write ourselves (whether it be TaskFlow or something
> >> spun out of Mistral or Ironic into Oslo).
> >>
> >
> > Right, those things are fairly generic and would definitely fit nicely
> > in a library.
> >
> > So, the simplest possible solution, I think, is to lock resource id +
> > graph version. Since we are scared of Zookeeper, we'll need a periodic
> > job in the engines that looks for stale locks, or we have to wait for
> > another stack operation to check for them.
>
> Maybe it's time we face our fears, have people even tried zookeeper?
>
> Honestly I start to wonder, because it has some really neat features if people
> just try it out...
I did some work in it when I was researching host monitoring feature for nova. It does neat things in monitoring processes. It can give very fast response when process dies (milliseconds timeframe fast). Nova uses that for example to support decision of choosing a host for vm spawn. My question is: do we need that? Sure, any form of additional information about heat service status can be useful, but I can't see any real reason why we would want to make this part of heat stack setup workflow.
Also, on "Common approach to HA" session we moved something like oslo.healthcheck (or whatever it will be called), common lib for service-group like behavior. In my opinion it's pointless to implement zookeeper management in every project separately (its already in nova..). Might be worth looking closely into this topic.
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list