[openstack-dev] Zero MQ remove central broker. Architecture change.
Li Ma
skywalker.nick at gmail.com
Thu Nov 20 02:32:23 UTC 2014
Hi Yatin,
Thanks for sharing your presentation. That looks great. Welcome to
contribute to ZeroMQ driver.
Cheers,
Li Ma
On 2014/11/19 12:50, yatin kumbhare wrote:
> Hello Folks,
>
> Couple of slides/diagrams, I documented it for my understanding way
> back for havana release. Particularly slide no. 10 onward.
>
> https://docs.google.com/presentation/d/1ZPWKXN7dzXs9bX3Ref9fPDiia912zsHCHNMh_VSMhJs/edit#slide=id.p
>
> I am also committed to using zeromq as it's light-weight/fast/scalable.
>
> I would like to chip in for further development regarding zeromq.
>
> Regards,
> Yatin
>
> On Wed, Nov 19, 2014 at 8:05 AM, Li Ma <skywalker.nick at gmail.com
> <mailto:skywalker.nick at gmail.com>> wrote:
>
>
> On 2014/11/19 1:49, Eric Windisch wrote:
>>
>> I think for this cycle we really do need to focus on
>> consolidating and
>> testing the existing driver design and fixing up the biggest
>> deficiency (1) before we consider moving forward with lots of new
>>
>>
>> +1
>>
>> 1) Outbound messaging connection re-use - right now every
>> outbound
>> messaging creates and consumes a tcp connection - this
>> approach scales
>> badly when neutron does large fanout casts.
>>
>>
>>
>> I'm glad you are looking at this and by doing so, will understand
>> the system better. I hope the following will give some insight
>> into, at least, why I made the decisions I made:
>> This was an intentional design trade-off. I saw three choices
>> here: build a fully decentralized solution, build a
>> fully-connected network, or use centralized brokerage. I wrote
>> off centralized brokerage immediately. The problem with a fully
>> connected system is that active TCP connections are required
>> between all of the nodes. I didn't think that would scale and
>> would be brittle against floods (intentional or otherwise).
>>
>> IMHO, I always felt the right solution for large fanout casts was
>> to use multicast. When the driver was written, Neutron didn't
>> exist and there was no use-case for large fanout casts, so I
>> didn't implement multicast, but knew it as an option if it became
>> necessary. It isn't the right solution for everyone, of course.
>>
> Using multicast will add some complexity of switch forwarding
> plane that it will enable and maintain multicast group
> communication. For large deployment scenario, I prefer to make
> forwarding simple and easy-to-maintain. IMO, run a set of
> fanout-router processes in the cluster can also achieve the goal.
> The data path is: openstack-daemon --------send the message (with
> fanout=true) ---------> fanout-router -----read the
> matchmaker------> send to the destinations
> Actually it just uses unicast to simulate multicast.
>> For connection reuse, you could manage a pool of connections and
>> keep those connections around for a configurable amount of time,
>> after which they'd expire and be re-opened. This would keep the
>> most actively used connections alive. One problem is that it
>> would make the service more brittle by making it far more
>> susceptible to running out of file descriptors by keeping
>> connections around significantly longer. However, this wouldn't
>> be as brittle as fully-connecting the nodes nor as poorly scalable.
>>
> +1. Set a large number of fds is not a problem. Because we use
> socket pool, we can control and keep the fixed number of fds.
>> If OpenStack and oslo.messaging were designed specifically around
>> this message pattern, I might suggest that the library and its
>> applications be aware of high-traffic topics and persist the
>> connections for those topics, while keeping others ephemeral. A
>> good example for Nova would be api->scheduler traffic would be
>> persistent, whereas scheduler->compute_node would be ephemeral.
>> Perhaps this is something that could still be added to the library.
>>
>> 2) PUSH/PULL tcp sockets - Pieter suggested we look at
>> ROUTER/DEALER
>> as an option once 1) is resolved - this socket type pairing
>> has some
>> interesting features which would help with resilience and
>> availability
>> including heartbeating.
>>
>>
>> Using PUSH/PULL does not eliminate the possibility of being fully
>> connected, nor is it incompatible with persistent connections. If
>> you're not going to be fully-connected, there isn't much
>> advantage to long-lived persistent connections and without those
>> persistent connections, you're not benefitting from features such
>> as heartbeating.
>>
> How about REQ/REP? I think it is appropriate for long-lived
> persistent connections and also provide reliability due to reply.
>> I'm not saying ROUTER/DEALER cannot be used, but use them with
>> care. They're designed for long-lived channels between hosts and
>> not for the ephemeral-type connections used in a peer-to-peer
>> system. Dealing with how to manage timeouts on the client and the
>> server and the swelling number of active file descriptions that
>> you'll get by using ROUTER/DEALER is not trivial, assuming you
>> can get past the management of all of those synchronous sockets
>> (hidden away by tons of eventlet greenthreads)...
>>
>> Extra anecdote: During a conversation at the OpenStack summit,
>> someone told me about their experiences using ZeroMQ and the pain
>> of using REQ/REP sockets and how they felt it was a mistake they
>> used them. We discussed a bit about some other problems such as
>> the fact it's impossible to avoid TCP fragmentation unless you
>> force all frames to 552 bytes or have a well-managed network
>> where you know the MTUs of all the devices you'll pass through.
>> Suggestions were made to make ZeroMQ better, until we realized we
>> had just described TCP-over-ZeroMQ-over-TCP, finished our beers,
>> and quickly changed topics.
> Well, seems I need to take my last question back. In our
> deployment, I always take advantage of jumbo frame to increase
> throughput. You said that REQ/REP would introduce TCP
> fragmentation unless zeromq frames == 552 bytes? Could you please
> elaborate?
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org <mailto:OpenStack-dev at lists.openstack.org>
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> <mailto:OpenStack-dev at lists.openstack.org>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20141120/0c863a36/attachment.html>
More information about the OpenStack-dev
mailing list