[openstack-dev] [Oslo] First steps towards amqp 1.0

Flavio Percoco flavio at redhat.com
Thu Dec 12 14:14:42 UTC 2013


On 11/12/13 09:31 -0500, Andrew Laski wrote:
>On 12/10/13 at 11:09am, Flavio Percoco wrote:
>>On 09/12/13 17:37 -0500, Russell Bryant wrote:
>>>On 12/09/2013 05:16 PM, Gordon Sim wrote:
>>>>On 12/09/2013 07:15 PM, Russell Bryant wrote:
>>
>>[...]
>>
>>>>>>One other pattern that can benefit from intermediated message flow is in
>>>>>>load balancing. If the processing entities are effectively 'pulling'
>>>>>>messages, this can more naturally balance the load according to capacity
>>>>>>than when the producer of the workload is trying to determine the best
>>>>>>balance.
>>>>>
>>>>>Yes, that's another factor.  Today, we rely on the message broker's
>>>>>behavior to equally distribute messages to a set of consumers.
>>>>
>>>>Sometimes you even _want_ message distribution to be 'unequal', if the
>>>>load varies by message or the capacity by consumer. E.g. If one consumer
>>>>is particularly slow (or is given a particularly arduous task), it may
>>>>not be optimal for it to receive the same portion of subsequent messages
>>>>as other less heavily loaded or more powerful consumers.
>>>
>>>Indeed.  We haven't tried to do that anywhere, but it would be an
>>>improvement for some cases.
>>
>>Agreed, this is something that worths experimenting.
>>
>>[...]
>>
>>>>>I'm very interested in diving deeper into how Dispatch would fit into
>>>>>the various ways OpenStack is using messaging today.  I'd like to get
>>>>>a better handle on how the use of Dispatch as an intermediary would
>>>>>scale out for a deployment that consists of 10s of thousands of
>>>>>compute nodes, for example.
>>>>>
>>>>>Is it roughly just that you can have a network of N Dispatch routers
>>>>>that route messages from point A to point B, and for notifications we
>>>>>would use a traditional message broker (qpidd or rabbitmq) ?
>>>>
>>>>For scaling the basic idea is that not all connections are made to the
>>>>same process and therefore not all messages need to travel through a
>>>>single intermediary process.
>>>>
>>>>So for N different routers, each have a portion of the total number of
>>>>publishers and consumers connected to them. Though client can
>>>>communicate even if they are not connected to the same router, each
>>>>router only needs to handle the messages sent by the publishers directly
>>>>attached, or sent to the consumer directly attached. It never needs to
>>>>see messages between publishers and consumer that are not directly
>>>>attached.
>>>>
>>>>To address your example, the 10s of thousands of compute nodes would be
>>>>spread across N routers. Assuming these were all interconnected, a
>>>>message from the scheduler would only travel through at most two of
>>>>these N routers (the one the scheduler was connected to and the one the
>>>>receiving compute node was connected to). No process needs to be able to
>>>>handle 10s of thousands of connections itself (as contrasted with full
>>>>direct, non-intermediated communication, where the scheduler would need
>>>>to manage connections to each of the compute nodes).
>>>>
>>>>This basic pattern is the same as networks of brokers, but Dispatch
>>>>router has been designed from the start to simply focus on that problem
>>>>(and not deal with all other broker related features, such as
>>>>transactions, durability, specialised queueing etc).
>>>
>>>Soudns awesome.  :-)
>>>
>>>>The other difference is that Dispatch Router does not accept
>>>>responsibility for messages, i.e. it does not offer any
>>>>store-and-forward behaviour. Any acknowledgement is end-to-end. This
>>>>avoids it having to replicate messages. On failure they can if needed by
>>>>replayed by the original sender.
>>>
>>>I think the lack of store-and-forward is OK.
>>>
>>>Right now, all of the Nova code is written to assume that the messaging
>>>is unreliable and that any message could get lost.  It may result in an
>>>operation failing, but it should fail gracefully.  Doing end-to-end
>>>acknowledgement may actually be an improvement.
>>
>>This is interesting and a very important point. I wonder what the
>>reliability expectations of other services w.r.t OpenStack messaging
>>are.
>>
>>I agree on the fact that p2p acknowledgement could be an improvement
>>but I'm also wondering how this (if ever) will affect projects - in
>>terms of requiring changes. One of the goals of this new driver is to
>>not require any changes on the existing projects.
>>
>>Also, a bit different but related topic, are there cases where tasks
>>are re-scheduled in nova? If so, what does nova do in this case? Are
>>those task sent back to `nova-scheduler` for re-scheduling?
>
>Yes, there are certain build failures that can occur which will cause 
>a re-schedule.  That's currently accomplished by the compute node 
>sending a message back to the scheduler so it can pick a new host.  
>I'm trying to shift that a bit so we're messaging the conductor rather 
>than the scheduler, but the basic structure of it is going to remain 
>the same for now.
>
>If you mean in progress operations being restarted after a service is 
>restarted, then no.  We're working towards making that possible but at 
>the moment it doesn't exist.


This is very valuable information. I wonder if the same applies for
other projects. I'd expect cinder to behave pretty much the same way
nova does in this area. I'm not sure about neutron, though.

I've a draft in my head of how the amqp 1.0 driver could be
implemented and how to map the current expectations of the messaging
layer to the new protocol.

I think a separate thread to discuss this mapping is worth it. There
are some critical areas that definitely need more discussion and that
could be refactored.

Sorry for the vague reply. Before I'd like to organize this ideas a
bit more before throwing them out there.

I'm very happy about the discussions we had in this thread, though. I
personally think - and based on this thread, I'm not the only one -
that moving forward to amqp 1.0 is a huge improvement for OpenStack.

Cheers,
FF

-- 
@flaper87
Flavio Percoco
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20131212/f7673e80/attachment.pgp>


More information about the OpenStack-dev mailing list