[openstack-dev] Zero MQ remove central broker. Architecture change.
Li Ma
skywalker.nick at gmail.com
Wed Nov 19 02:35:14 UTC 2014
On 2014/11/19 1:49, Eric Windisch wrote:
>
> I think for this cycle we really do need to focus on consolidating and
> testing the existing driver design and fixing up the biggest
> deficiency (1) before we consider moving forward with lots of new
>
>
> +1
>
> 1) Outbound messaging connection re-use - right now every outbound
> messaging creates and consumes a tcp connection - this approach scales
> badly when neutron does large fanout casts.
>
>
>
> I'm glad you are looking at this and by doing so, will understand the
> system better. I hope the following will give some insight into, at
> least, why I made the decisions I made:
> This was an intentional design trade-off. I saw three choices here:
> build a fully decentralized solution, build a fully-connected network,
> or use centralized brokerage. I wrote off centralized brokerage
> immediately. The problem with a fully connected system is that active
> TCP connections are required between all of the nodes. I didn't think
> that would scale and would be brittle against floods (intentional or
> otherwise).
>
> IMHO, I always felt the right solution for large fanout casts was to
> use multicast. When the driver was written, Neutron didn't exist and
> there was no use-case for large fanout casts, so I didn't implement
> multicast, but knew it as an option if it became necessary. It isn't
> the right solution for everyone, of course.
>
Using multicast will add some complexity of switch forwarding plane that
it will enable and maintain multicast group communication. For large
deployment scenario, I prefer to make forwarding simple and
easy-to-maintain. IMO, run a set of fanout-router processes in the
cluster can also achieve the goal.
The data path is: openstack-daemon --------send the message (with
fanout=true) ---------> fanout-router -----read the matchmaker------>
send to the destinations
Actually it just uses unicast to simulate multicast.
> For connection reuse, you could manage a pool of connections and keep
> those connections around for a configurable amount of time, after
> which they'd expire and be re-opened. This would keep the most
> actively used connections alive. One problem is that it would make the
> service more brittle by making it far more susceptible to running out
> of file descriptors by keeping connections around significantly
> longer. However, this wouldn't be as brittle as fully-connecting the
> nodes nor as poorly scalable.
>
+1. Set a large number of fds is not a problem. Because we use socket
pool, we can control and keep the fixed number of fds.
> If OpenStack and oslo.messaging were designed specifically around this
> message pattern, I might suggest that the library and its applications
> be aware of high-traffic topics and persist the connections for those
> topics, while keeping others ephemeral. A good example for Nova would
> be api->scheduler traffic would be persistent, whereas
> scheduler->compute_node would be ephemeral. Perhaps this is something
> that could still be added to the library.
>
> 2) PUSH/PULL tcp sockets - Pieter suggested we look at ROUTER/DEALER
> as an option once 1) is resolved - this socket type pairing has some
> interesting features which would help with resilience and availability
> including heartbeating.
>
>
> Using PUSH/PULL does not eliminate the possibility of being fully
> connected, nor is it incompatible with persistent connections. If
> you're not going to be fully-connected, there isn't much advantage to
> long-lived persistent connections and without those persistent
> connections, you're not benefitting from features such as heartbeating.
>
How about REQ/REP? I think it is appropriate for long-lived persistent
connections and also provide reliability due to reply.
> I'm not saying ROUTER/DEALER cannot be used, but use them with care.
> They're designed for long-lived channels between hosts and not for the
> ephemeral-type connections used in a peer-to-peer system. Dealing with
> how to manage timeouts on the client and the server and the swelling
> number of active file descriptions that you'll get by using
> ROUTER/DEALER is not trivial, assuming you can get past the management
> of all of those synchronous sockets (hidden away by tons of eventlet
> greenthreads)...
>
> Extra anecdote: During a conversation at the OpenStack summit, someone
> told me about their experiences using ZeroMQ and the pain of using
> REQ/REP sockets and how they felt it was a mistake they used them. We
> discussed a bit about some other problems such as the fact it's
> impossible to avoid TCP fragmentation unless you force all frames to
> 552 bytes or have a well-managed network where you know the MTUs of
> all the devices you'll pass through. Suggestions were made to make
> ZeroMQ better, until we realized we had just described
> TCP-over-ZeroMQ-over-TCP, finished our beers, and quickly changed topics.
Well, seems I need to take my last question back. In our deployment, I
always take advantage of jumbo frame to increase throughput. You said
that REQ/REP would introduce TCP fragmentation unless zeromq frames ==
552 bytes? Could you please elaborate?
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20141119/387c6b6d/attachment.html>
More information about the OpenStack-dev
mailing list