[openstack-dev] Zero MQ remove central broker. Architecture change.

Li Ma skywalker.nick at gmail.com
Thu Nov 20 02:32:23 UTC 2014


Hi Yatin,

Thanks for sharing your presentation. That looks great. Welcome to 
contribute to ZeroMQ driver.

Cheers,
Li Ma

On 2014/11/19 12:50, yatin kumbhare wrote:
> Hello Folks,
>
> Couple of slides/diagrams, I documented it for my understanding way 
> back for havana release. Particularly slide no. 10 onward.
>
> https://docs.google.com/presentation/d/1ZPWKXN7dzXs9bX3Ref9fPDiia912zsHCHNMh_VSMhJs/edit#slide=id.p
>
> I am also committed to using zeromq as it's light-weight/fast/scalable.
>
> I would like to chip in for further development regarding zeromq.
>
> Regards,
> Yatin
>
> On Wed, Nov 19, 2014 at 8:05 AM, Li Ma <skywalker.nick at gmail.com 
> <mailto:skywalker.nick at gmail.com>> wrote:
>
>
>     On 2014/11/19 1:49, Eric Windisch wrote:
>>
>>         I think for this cycle we really do need to focus on
>>         consolidating and
>>         testing the existing driver design and fixing up the biggest
>>         deficiency (1) before we consider moving forward with lots of new
>>
>>
>>     +1
>>
>>         1) Outbound messaging connection re-use - right now every
>>         outbound
>>         messaging creates and consumes a tcp connection - this
>>         approach scales
>>         badly when neutron does large fanout casts.
>>
>>
>>
>>     I'm glad you are looking at this and by doing so, will understand
>>     the system better. I hope the following will give some insight
>>     into, at least, why I made the decisions I made:
>>     This was an intentional design trade-off. I saw three choices
>>     here: build a fully decentralized solution, build a
>>     fully-connected network, or use centralized brokerage. I wrote
>>     off centralized brokerage immediately. The problem with a fully
>>     connected system is that active TCP connections are required
>>     between all of the nodes. I didn't think that would scale and
>>     would be brittle against floods (intentional or otherwise).
>>
>>     IMHO, I always felt the right solution for large fanout casts was
>>     to use multicast. When the driver was written, Neutron didn't
>>     exist and there was no use-case for large fanout casts, so I
>>     didn't implement multicast, but knew it as an option if it became
>>     necessary. It isn't the right solution for everyone, of course.
>>
>     Using multicast will add some complexity of switch forwarding
>     plane that it will enable and maintain multicast group
>     communication. For large deployment scenario, I prefer to make
>     forwarding simple and easy-to-maintain. IMO, run a set of
>     fanout-router processes in the cluster can also achieve the goal.
>     The data path is: openstack-daemon --------send the message (with
>     fanout=true) ---------> fanout-router -----read the
>     matchmaker------> send to the destinations
>     Actually it just uses unicast to simulate multicast.
>>     For connection reuse, you could manage a pool of connections and
>>     keep those connections around for a configurable amount of time,
>>     after which they'd expire and be re-opened. This would keep the
>>     most actively used connections alive. One problem is that it
>>     would make the service more brittle by making it far more
>>     susceptible to running out of file descriptors by keeping
>>     connections around significantly longer. However, this wouldn't
>>     be as brittle as fully-connecting the nodes nor as poorly scalable.
>>
>     +1. Set a large number of fds is not a problem. Because we use
>     socket pool, we can control and keep the fixed number of fds.
>>     If OpenStack and oslo.messaging were designed specifically around
>>     this message pattern, I might suggest that the library and its
>>     applications be aware of high-traffic topics and persist the
>>     connections for those topics, while keeping others ephemeral. A
>>     good example for Nova would be api->scheduler traffic would be
>>     persistent, whereas scheduler->compute_node would be ephemeral. 
>>     Perhaps this is something that could still be added to the library.
>>
>>         2) PUSH/PULL tcp sockets - Pieter suggested we look at
>>         ROUTER/DEALER
>>         as an option once 1) is resolved - this socket type pairing
>>         has some
>>         interesting features which would help with resilience and
>>         availability
>>         including heartbeating. 
>>
>>
>>     Using PUSH/PULL does not eliminate the possibility of being fully
>>     connected, nor is it incompatible with persistent connections. If
>>     you're not going to be fully-connected, there isn't much
>>     advantage to long-lived persistent connections and without those
>>     persistent connections, you're not benefitting from features such
>>     as heartbeating.
>>
>     How about REQ/REP? I think it is appropriate for long-lived
>     persistent connections and also provide reliability due to reply.
>>     I'm not saying ROUTER/DEALER cannot be used, but use them with
>>     care. They're designed for long-lived channels between hosts and
>>     not for the ephemeral-type connections used in a peer-to-peer
>>     system. Dealing with how to manage timeouts on the client and the
>>     server and the swelling number of active file descriptions that
>>     you'll get by using ROUTER/DEALER is not trivial, assuming you
>>     can get past the management of all of those synchronous sockets
>>     (hidden away by tons of eventlet greenthreads)...
>>
>>     Extra anecdote: During a conversation at the OpenStack summit,
>>     someone told me about their experiences using ZeroMQ and the pain
>>     of using REQ/REP sockets and how they felt it was a mistake they
>>     used them. We discussed a bit about some other problems such as
>>     the fact it's impossible to avoid TCP fragmentation unless you
>>     force all frames to 552 bytes or have a well-managed network
>>     where you know the MTUs of all the devices you'll pass through.
>>     Suggestions were made to make ZeroMQ better, until we realized we
>>     had just described TCP-over-ZeroMQ-over-TCP, finished our beers,
>>     and quickly changed topics.
>     Well, seems I need to take my last question back. In our
>     deployment, I always take advantage of jumbo frame to increase
>     throughput. You said that REQ/REP would introduce TCP
>     fragmentation unless zeromq frames == 552 bytes? Could you please
>     elaborate?
>>
>>
>>     _______________________________________________
>>     OpenStack-dev mailing list
>>     OpenStack-dev at lists.openstack.org  <mailto:OpenStack-dev at lists.openstack.org>
>>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>     _______________________________________________
>     OpenStack-dev mailing list
>     OpenStack-dev at lists.openstack.org
>     <mailto:OpenStack-dev at lists.openstack.org>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20141120/0c863a36/attachment.html>


More information about the OpenStack-dev mailing list