[openstack-dev] [Heat] Using Job Queues for timeout ops

Jastrzebski, Michal michal.jastrzebski at intel.com
Fri Nov 14 06:50:54 UTC 2014



> -----Original Message-----
> From: Clint Byrum [mailto:clint at fewbar.com]
> Sent: Thursday, November 13, 2014 8:00 PM
> To: openstack-dev
> Subject: Re: [openstack-dev] [Heat] Using Job Queues for timeout ops
> 
> Excerpts from Zane Bitter's message of 2014-11-13 09:55:43 -0800:
> > On 13/11/14 09:58, Clint Byrum wrote:
> > > Excerpts from Zane Bitter's message of 2014-11-13 05:54:03 -0800:
> > >> On 13/11/14 03:29, Murugan, Visnusaran wrote:
> > >>> Hi all,
> > >>>
> > >>> Convergence-POC distributes stack operations by sending resource
> > >>> actions over RPC for any heat-engine to execute. Entire stack
> > >>> lifecycle will be controlled by worker/observer notifications.
> > >>> This distributed model has its own advantages and disadvantages.
> > >>>
> > >>> Any stack operation has a timeout and a single engine will be
> > >>> responsible for it. If that engine goes down, timeout is lost
> > >>> along with it. So a traditional way is for other engines to
> > >>> recreate timeout from scratch. Also a missed resource action
> > >>> notification will be detected only when stack operation timeout
> happens.
> > >>>
> > >>> To overcome this, we will need the following capability:
> > >>>
> > >>> 1.Resource timeout (can be used for retry)
> > >>
> > >> I don't believe this is strictly needed for phase 1 (essentially we
> > >> don't have it now, so nothing gets worse).
> > >>
> > >
> > > We do have a stack timeout, and it stands to reason that we won't
> > > have a single box with a timeout greenthread after this, so a
> > > strategy is needed.
> >
> > Right, that was 2, but I was talking specifically about the resource
> > retry. I think we agree on both points.
> >
> > >> For phase 2, yes, we'll want it. One thing we haven't discussed
> > >> much is that if we used Zaqar for this then the observer could
> > >> claim a message but not acknowledge it until it had processed it,
> > >> so we could have guaranteed delivery.
> > >>
> > >
> > > Frankly, if oslo.messaging doesn't support reliable delivery then we
> > > need to add it.
> >
> > That is straight-up impossible with AMQP. Either you ack the message
> > and risk losing it if the worker dies before processing is complete,
> > or you don't ack the message until it's processed and you become a
> > blocker for every other worker trying to pull jobs off the queue. It
> > works fine when you have only one worker; otherwise not so much. This
> > is the crux of the whole "why isn't Zaqar just Rabbit" debate.
> >
> 
> I'm not sure we have the same understanding of AMQP, so hopefully we can
> clarify here. This stackoverflow answer echoes my understanding:
> 
> http://stackoverflow.com/questions/17841843/rabbitmq-does-one-
> consumer-block-the-other-consumers-of-the-same-queue
> 
> Not ack'ing just means they might get retransmitted if we never ack. It
> doesn't block other consumers. And as the link above quotes from the
> AMQP spec, when there are multiple consumers, FIFO is not guaranteed.
> Other consumers get other messages.
> 
> So just add the ability for a consumer to read, work, ack to oslo.messaging,
> and this is mostly handled via AMQP. Of course that also likely means no
> zeromq for Heat without accepting that messages may be lost if workers die.
> 
> Basically we need to add something that is not "RPC" but instead "jobqueue"
> that mimics this:
> 
> http://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messagin
> g/rpc/dispatcher.py#n131
> 
> I've always been suspicious of this bit of code, as it basically means that if
> anything fails between that call, and the one below it, we have lost contact,
> but as long as clients are written to re-send when there is a lack of reply,
> there shouldn't be a problem. But, for a job queue, there is no reply, and so
> the worker would dispatch, and then acknowledge after the dispatched call
> had returned (including having completed the step where new messages are
> added to the queue for any newly-possible children).
> 
> Just to be clear, I believe what Zaqar adds is the ability to peek at a specific
> message ID and not affect it in the queue, which is entirely different than
> ACK'ing the ones you've already received in your session.
> 
> > Most stuff in OpenStack gets around this by doing synchronous calls
> > across oslo.messaging, where there is an end-to-end ack. We don't want
> > that here though. We'll probably have to make do with having ways to
> > recover after a failure (kick off another update with the same data is
> > always an option). The hard part is that if something dies we don't
> > really want to wait until the stack timeout to start recovering.
> >
> 
> I fully agree. Josh's point about using a coordination service like Zookeeper to
> maintain liveness is an interesting one here. If we just make sure that all the
> workers that have claimed work off the queue are alive, that should be
> sufficient to prevent a hanging stack situation like you describe above.
> 
> > > Zaqar should have nothing to do with this and is, IMO, a poor choice
> > > at this stage, though I like the idea of using it in the future so
> > > that we can make Heat more of an outside-the-cloud app.
> >
> > I'm inclined to agree that it would be hard to force operators to
> > deploy Zaqar in order to be able to deploy Heat, and that we should
> > probably be cautious for that reason.
> >
> > That said, from a purely technical point of view it's not a poor
> > choice at all - it has *exactly* the semantics we want (unlike AMQP),
> > and at least to the extent that the operator wants to offer Zaqar to
> > users anyway it completely eliminates a whole backend that they would
> > otherwise have to deploy. It's a tragedy that all of OpenStack has not
> > been designed to build upon itself in this way and it causes me
> > physical pain to know that we're about to perpetuate it.
> >
> > >>> 2.Recover from engine failure (loss of stack timeout, resource
> > >>> action
> > >>> notification)
> > >>>
> > >>> Suggestion:
> > >>>
> > >>> 1.Use task queue like celery to host timeouts for both stack and
> resource.
> > >>
> > >> I believe Celery is more or less a non-starter as an OpenStack
> > >> dependency because it uses Kombu directly to talk to the queue, vs.
> > >> oslo.messaging which is an abstraction layer over Kombu, Qpid,
> > >> ZeroMQ and maybe others in the future. i.e. requiring Celery means
> > >> that some users would be forced to install Rabbit for the first time.
> > >>
> > >> One option would be to fork Celery and replace Kombu with
> > >> oslo.messaging as its abstraction layer. Good luck getting that
> > >> maintained though, since Celery _invented_ Kombu to be it's
> abstraction layer.
> > >>
> > >
> > > A slight side point here: Kombu supports Qpid and ZeroMQ.
> > > Oslo.messaging
> >
> > You're right about Kombu supporting Qpid, it appears they added it. I
> > don't see ZeroMQ on the list though:
> >
> >
> http://kombu.readthedocs.org/en/latest/userguide/connections.html#tran
> > sport-comparison
> >
> 
> They, confusingly, call it zmq, and it may not be in a recent release:
> 
> https://github.com/celery/kombu/blob/master/kombu/transport/zmq.py
> 
> > > is more about having a unified API than a set of magic backends. It
> > > actually boggles my mind why we didn't just use kombu (cue 20
> > > reactions with people saying it wasn't EXACTLY right), but I think
> > > we're committed
> >
> > Well, we also have to take into account the fact that Qpid support was
> > added only during the last 9 months, whereas oslo.messaging was
> > implemented 3 years ago and time travel hasn't been invented yet (for
> > any definition of 'yet').
> >
> 
> Go back in time 3 years ago, and perhaps we could have done all the work
> we've done in kombu. Hindsight though.
> 
> > > to oslo.messaging now. Anyway, celery would need no such refactor,
> > > as kombu would be able to access the same bus as everything else just
> fine.
> >
> > Interesting, so that would make it easier to get Celery added to the
> > global requirements, although we'd likely still have headaches to deal
> > with around configuration.
> >
> 
> Yeah, I'm not advocating for celery, just pointing out that it has become more
> like what we already deploy. :)
> 
> > >>> 2.Poll database for engine failures and restart timers/ retrigger
> > >>> resource retry (IMHO: This would be a traditional and weighs
> > >>> heavy)
> > >>>
> > >>> 3.Migrate heat to use TaskFlow. (Too many code change)
> > >>
> > >> If it's just handling timed triggers (maybe this is closer to #2)
> > >> and not migrating the whole code base, then I don't see why it
> > >> would be a big change (or even a change at all - it's basically new
> functionality).
> > >> I'm not sure if TaskFlow has something like this already. If not we
> > >> could also look at what Mistral is doing with timed tasks and see
> > >> if we could spin some of it out into an Oslo library.
> > >>
> > >
> > > I feel like it boils down to something running periodically checking
> > > for scheduled tasks that are due to run but have not run yet. I
> > > wonder if we can actually look at Ironic for how they do this,
> > > because Ironic polls power state of machines constantly, and uses a
> > > hash ring to make sure only one conductor is polling any one machine
> > > at a time. If we broke stacks up into a hash ring like that for the
> > > purpose of singleton tasks like timeout checking, that might work out
> nicely.
> >
> > +1 for something like this, and +2 if we can get it from a library we
> > don't have to write ourselves (whether it be TaskFlow or something
> > spun out of Mistral or Ironic into Oslo).
> >
> 
> Right, those things are fairly generic and would definitely fit nicely in a library.
> 
> So, the simplest possible solution, I think, is to lock resource id + graph
> version. Since we are scared of Zookeeper, we'll need a periodic job in the
> engines that looks for stale locks, or we have to wait for another stack
> operation to check for them.

There is spec with something similar https://review.openstack.org/#/c/122597/
However I'd rather do convergence in a way where we won't have to monitor that. I mean in a way we don't care how many engines work as long as at least one of them works.

> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list