Open Stack

Tue Nov 20 15:12:29 UTC 2012

On 11/20/2012 12:23 AM, Mike Wilson wrote:
> Hey folks,
> 
> I've been spending some time with qpid recently investigating a bug
> where compute nodes will randomly loose their binding to their
> compute.hostname topics. When this happens, starting new instances,
> deleting and lots of other functionality which is addressed directly to
> the compute node topic silently fail. Anything that is a "cast" instead
> of a "call" just fails, no errors, no logging, etc. This is because the
> message goes to the exchange but since there is no one listening on the
> compute topic it is silently dropped. Apparently there are ways to deal
> with this setting up a DLQ, also the AMQP spec is built to error out
> when this happens if certain flags are set, see the following for more info:
> 
> http://qpid.2158936.n2.nabble.com/How-to-know-when-a-message-could-not-be-enqueued-td3751016.html#a3751626
> 
> In any case, I'm still not quite set on how I will handle this, I'm
> leaning towards implementing the discard-unroutable property in qpid and
> handling the exception in the sender. But I'm still not sure that is the
> best way to go about it. I'm considering using queues as an alternative
> to communicate with nodes.  They are fairly persistent so if there isn't
> a receiver on the line when we send the message they could pick it up
> later. I'm looking for some feedback from the community on this as I
> would like whatever work I'm doing to make it upstream. Thx in advance.

We should start by defining what behavior we want.  I agree with what
you say here at the end.  Ideally when a message is sent to 'compute' or
'compute.<node>' but nothing is currently listening, we want that
message to be queued up and waiting for a compute node to come back
alive and handle it.  (We should be setting a TTL on all messages to
ensure that they don't stay in a queue for forever, but we're not doing
that yet.)

Is the fact that it's a topic exchange messing this up?  AFAIK, nothing
makes use of the fact that these are topic exchanges, except maybe
notifications (rpc_notifier), so we need to watch out for that.

For the 'compute' and 'compute.<node>' style queues used by all of the
services, I believe queues on a direct exchange would work just fine for
the semantics we care about.

-- 
Russell Bryant

Open Stack

[openstack-dev] Why topics instead of queues to communicate with compute nodes?

OpenStack

Community

Documentation

Branding & Legal