[openstack-dev] Zero MQ remove central broker. Architecture change.

yatin kumbhare yatinkumbhare at gmail.com
Wed Nov 19 04:50:36 UTC 2014


Hello Folks,

Couple of slides/diagrams, I documented it for my understanding way back
for havana release. Particularly slide no. 10 onward.

https://docs.google.com/presentation/d/1ZPWKXN7dzXs9bX3Ref9fPDiia912zsHCHNMh_VSMhJs/edit#slide=id.p

I am also committed to using zeromq as it's light-weight/fast/scalable.

I would like to chip in for further development regarding zeromq.

Regards,
Yatin

On Wed, Nov 19, 2014 at 8:05 AM, Li Ma <skywalker.nick at gmail.com> wrote:

>
> On 2014/11/19 1:49, Eric Windisch wrote:
>
>   I think for this cycle we really do need to focus on consolidating and
>> testing the existing driver design and fixing up the biggest
>> deficiency (1) before we consider moving forward with lots of new
>
>
>  +1
>
>
>> 1) Outbound messaging connection re-use - right now every outbound
>> messaging creates and consumes a tcp connection - this approach scales
>> badly when neutron does large fanout casts.
>>
>
>
>  I'm glad you are looking at this and by doing so, will understand the
> system better. I hope the following will give some insight into, at least,
> why I made the decisions I made:
>
> This was an intentional design trade-off. I saw three choices here: build
> a fully decentralized solution, build a fully-connected network, or use
> centralized brokerage. I wrote off centralized brokerage immediately. The
> problem with a fully connected system is that active TCP connections are
> required between all of the nodes. I didn't think that would scale and
> would be brittle against floods (intentional or otherwise).
>
>  IMHO, I always felt the right solution for large fanout casts was to use
> multicast. When the driver was written, Neutron didn't exist and there was
> no use-case for large fanout casts, so I didn't implement multicast, but
> knew it as an option if it became necessary. It isn't the right solution
> for everyone, of course.
>
>    Using multicast will add some complexity of switch forwarding plane
> that it will enable and maintain multicast group communication. For large
> deployment scenario, I prefer to make forwarding simple and
> easy-to-maintain. IMO, run a set of fanout-router processes in the cluster
> can also achieve the goal.
> The data path is: openstack-daemon --------send the message (with
> fanout=true) ---------> fanout-router -----read the matchmaker------> send
> to the destinations
> Actually it just uses unicast to simulate multicast.
>
>   For connection reuse, you could manage a pool of connections and keep
> those connections around for a configurable amount of time, after which
> they'd expire and be re-opened. This would keep the most actively used
> connections alive. One problem is that it would make the service more
> brittle by making it far more susceptible to running out of file
> descriptors by keeping connections around significantly longer. However,
> this wouldn't be as brittle as fully-connecting the nodes nor as poorly
> scalable.
>
>    +1. Set a large number of fds is not a problem. Because we use socket
> pool, we can control and keep the fixed number of fds.
>
>   If OpenStack and oslo.messaging were designed specifically around this
> message pattern, I might suggest that the library and its applications be
> aware of high-traffic topics and persist the connections for those topics,
> while keeping others ephemeral. A good example for Nova would be
> api->scheduler traffic would be persistent, whereas scheduler->compute_node
> would be ephemeral.  Perhaps this is something that could still be added to
> the library.
>
>  2) PUSH/PULL tcp sockets - Pieter suggested we look at ROUTER/DEALER
>> as an option once 1) is resolved - this socket type pairing has some
>> interesting features which would help with resilience and availability
>> including heartbeating.
>
>
>  Using PUSH/PULL does not eliminate the possibility of being fully
> connected, nor is it incompatible with persistent connections. If you're
> not going to be fully-connected, there isn't much advantage to long-lived
> persistent connections and without those persistent connections, you're not
> benefitting from features such as heartbeating.
>
>    How about REQ/REP? I think it is appropriate for long-lived persistent
> connections and also provide reliability due to reply.
>
>   I'm not saying ROUTER/DEALER cannot be used, but use them with care.
> They're designed for long-lived channels between hosts and not for the
> ephemeral-type connections used in a peer-to-peer system. Dealing with how
> to manage timeouts on the client and the server and the swelling number of
> active file descriptions that you'll get by using ROUTER/DEALER is not
> trivial, assuming you can get past the management of all of those
> synchronous sockets (hidden away by tons of eventlet greenthreads)...
>
>  Extra anecdote: During a conversation at the OpenStack summit, someone
> told me about their experiences using ZeroMQ and the pain of using REQ/REP
> sockets and how they felt it was a mistake they used them. We discussed a
> bit about some other problems such as the fact it's impossible to avoid TCP
> fragmentation unless you force all frames to 552 bytes or have a
> well-managed network where you know the MTUs of all the devices you'll pass
> through. Suggestions were made to make ZeroMQ better, until we realized we
> had just described TCP-over-ZeroMQ-over-TCP, finished our beers, and
> quickly changed topics.
>
> Well, seems I need to take my last question back. In our deployment, I
> always take advantage of jumbo frame to increase throughput. You said that
> REQ/REP would introduce TCP fragmentation unless zeromq frames == 552
> bytes? Could you please elaborate?
>
>
>
> _______________________________________________
> OpenStack-dev mailing listOpenStack-dev at lists.openstack.orghttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20141119/3ac39205/attachment.html>


More information about the OpenStack-dev mailing list