<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">I think for this cycle we really do need to focus on consolidating and<br>testing the existing driver design and fixing up the biggest<br>deficiency (1) before we consider moving forward with lots of new</blockquote></div><div><br></div><div>+1</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

1) Outbound messaging connection re-use - right now every outbound<br>

messaging creates and consumes a tcp connection - this approach scales<br>

badly when neutron does large fanout casts.<br></blockquote><div><br></div><div><br></div><div>I'm glad you are looking at this and by doing so, will understand the system better. I hope the following will give some insight into, at least, why I made the decisions I made:</div><div> </div><div>This was an intentional design trade-off. I saw three choices here: build a fully decentralized solution, build a fully-connected network, or use centralized brokerage. I wrote off centralized brokerage immediately. The problem with a fully connected system is that active TCP connections are required between all of the nodes. I didn't think that would scale and would be brittle against floods (intentional or otherwise).</div><div><br></div><div>IMHO, I always felt the right solution for large fanout casts was to use multicast. When the driver was written, Neutron didn't exist and there was no use-case for large fanout casts, so I didn't implement multicast, but knew it as an option if it became necessary. It isn't the right solution for everyone, of course.</div><div><br></div><div>For connection reuse, you could manage a pool of connections and keep those connections around for a configurable amount of time, after which they'd expire and be re-opened. This would keep the most actively used connections alive. One problem is that it would make the service more brittle by making it far more susceptible to running out of file descriptors by keeping connections around significantly longer. However, this wouldn't be as brittle as fully-connecting the nodes nor as poorly scalable.</div><div><br></div><div>If OpenStack and oslo.messaging were designed specifically around this message pattern, I might suggest that the library and its applications be aware of high-traffic topics and persist the connections for those topics, while keeping others ephemeral. A good example for Nova would be api->scheduler traffic would be persistent, whereas scheduler->compute_node would be ephemeral.  Perhaps this is something that could still be added to the library.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">2) PUSH/PULL tcp sockets - Pieter suggested we look at ROUTER/DEALER<br>as an option once 1) is resolved - this socket type pairing has some<br>interesting features which would help with resilience and availability<br>including heartbeating. </blockquote><div><br></div><div>Using PUSH/PULL does not eliminate the possibility of being fully connected, nor is it incompatible with persistent connections. If you're not going to be fully-connected, there isn't much advantage to long-lived persistent connections and without those persistent connections, you're not benefitting from features such as heartbeating.</div><div><br></div><div>I'm not saying ROUTER/DEALER cannot be used, but use them with care. They're designed for long-lived channels between hosts and not for the ephemeral-type connections used in a peer-to-peer system. Dealing with how to manage timeouts on the client and the server and the swelling number of active file descriptions that you'll get by using ROUTER/DEALER is not trivial, assuming you can get past the management of all of those synchronous sockets (hidden away by tons of eventlet greenthreads)...</div><div><br></div><div>Extra anecdote: During a conversation at the OpenStack summit, someone told me about their experiences using ZeroMQ and the pain of using REQ/REP sockets and how they felt it was a mistake they used them. We discussed a bit about some other problems such as the fact it's impossible to avoid TCP fragmentation unless you force all frames to 552 bytes or have a well-managed network where you know the MTUs of all the devices you'll pass through. Suggestions were made to make ZeroMQ better, until we realized we had just described TCP-over-ZeroMQ-over-TCP, finished our beers, and quickly changed topics.<br></div></div></div></div>