[openstack-dev] Zero MQ remove central broker. Architecture change.

Li Ma skywalker.nick at gmail.com
Wed Nov 19 02:35:14 UTC 2014

On 2014/11/19 1:49, Eric Windisch wrote:
>     I think for this cycle we really do need to focus on consolidating and
>     testing the existing driver design and fixing up the biggest
>     deficiency (1) before we consider moving forward with lots of new
> +1
>     1) Outbound messaging connection re-use - right now every outbound
>     messaging creates and consumes a tcp connection - this approach scales
>     badly when neutron does large fanout casts.
> I'm glad you are looking at this and by doing so, will understand the 
> system better. I hope the following will give some insight into, at 
> least, why I made the decisions I made:
> This was an intentional design trade-off. I saw three choices here: 
> build a fully decentralized solution, build a fully-connected network, 
> or use centralized brokerage. I wrote off centralized brokerage 
> immediately. The problem with a fully connected system is that active 
> TCP connections are required between all of the nodes. I didn't think 
> that would scale and would be brittle against floods (intentional or 
> otherwise).
> IMHO, I always felt the right solution for large fanout casts was to 
> use multicast. When the driver was written, Neutron didn't exist and 
> there was no use-case for large fanout casts, so I didn't implement 
> multicast, but knew it as an option if it became necessary. It isn't 
> the right solution for everyone, of course.
Using multicast will add some complexity of switch forwarding plane that 
it will enable and maintain multicast group communication. For large 
deployment scenario, I prefer to make forwarding simple and 
easy-to-maintain. IMO, run a set of fanout-router processes in the 
cluster can also achieve the goal.
The data path is: openstack-daemon --------send the message (with 
fanout=true) ---------> fanout-router -----read the matchmaker------> 
send to the destinations
Actually it just uses unicast to simulate multicast.
> For connection reuse, you could manage a pool of connections and keep 
> those connections around for a configurable amount of time, after 
> which they'd expire and be re-opened. This would keep the most 
> actively used connections alive. One problem is that it would make the 
> service more brittle by making it far more susceptible to running out 
> of file descriptors by keeping connections around significantly 
> longer. However, this wouldn't be as brittle as fully-connecting the 
> nodes nor as poorly scalable.
+1. Set a large number of fds is not a problem. Because we use socket 
pool, we can control and keep the fixed number of fds.
> If OpenStack and oslo.messaging were designed specifically around this 
> message pattern, I might suggest that the library and its applications 
> be aware of high-traffic topics and persist the connections for those 
> topics, while keeping others ephemeral. A good example for Nova would 
> be api->scheduler traffic would be persistent, whereas 
> scheduler->compute_node would be ephemeral.  Perhaps this is something 
> that could still be added to the library.
>     2) PUSH/PULL tcp sockets - Pieter suggested we look at ROUTER/DEALER
>     as an option once 1) is resolved - this socket type pairing has some
>     interesting features which would help with resilience and availability
>     including heartbeating. 
> Using PUSH/PULL does not eliminate the possibility of being fully 
> connected, nor is it incompatible with persistent connections. If 
> you're not going to be fully-connected, there isn't much advantage to 
> long-lived persistent connections and without those persistent 
> connections, you're not benefitting from features such as heartbeating.
How about REQ/REP? I think it is appropriate for long-lived persistent 
connections and also provide reliability due to reply.
> I'm not saying ROUTER/DEALER cannot be used, but use them with care. 
> They're designed for long-lived channels between hosts and not for the 
> ephemeral-type connections used in a peer-to-peer system. Dealing with 
> how to manage timeouts on the client and the server and the swelling 
> number of active file descriptions that you'll get by using 
> ROUTER/DEALER is not trivial, assuming you can get past the management 
> of all of those synchronous sockets (hidden away by tons of eventlet 
> greenthreads)...
> Extra anecdote: During a conversation at the OpenStack summit, someone 
> told me about their experiences using ZeroMQ and the pain of using 
> REQ/REP sockets and how they felt it was a mistake they used them. We 
> discussed a bit about some other problems such as the fact it's 
> impossible to avoid TCP fragmentation unless you force all frames to 
> 552 bytes or have a well-managed network where you know the MTUs of 
> all the devices you'll pass through. Suggestions were made to make 
> ZeroMQ better, until we realized we had just described 
> TCP-over-ZeroMQ-over-TCP, finished our beers, and quickly changed topics.
Well, seems I need to take my last question back. In our deployment, I 
always take advantage of jumbo frame to increase throughput. You said 
that REQ/REP would introduce TCP fragmentation unless zeromq frames == 
552 bytes? Could you please elaborate?
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20141119/387c6b6d/attachment.html>

More information about the OpenStack-dev mailing list