<html>
  <head>
    <meta content="text/html; charset=windows-1252"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <br>
    <div class="moz-cite-prefix">On 2014/11/19 1:49, Eric Windisch
      wrote:<br>
    </div>
    <blockquote
cite="mid:CAAZDpLe0cEQrje4P5Ow6DF+YtX8nh5jBMmta4L-X4sNEOq9tZA@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">I
                think for this cycle we really do need to focus on
                consolidating and<br>
                testing the existing driver design and fixing up the
                biggest<br>
                deficiency (1) before we consider moving forward with
                lots of new</blockquote>
            </div>
            <div><br>
            </div>
            <div>+1</div>
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">1)
              Outbound messaging connection re-use - right now every
              outbound<br>
              messaging creates and consumes a tcp connection - this
              approach scales<br>
              badly when neutron does large fanout casts.<br>
            </blockquote>
            <div><br>
            </div>
            <div><br>
            </div>
            <div>I'm glad you are looking at this and by doing so, will
              understand the system better. I hope the following will
              give some insight into, at least, why I made the decisions
              I made:</div>
            <div> </div>
            <div>This was an intentional design trade-off. I saw three
              choices here: build a fully decentralized solution, build
              a fully-connected network, or use centralized brokerage. I
              wrote off centralized brokerage immediately. The problem
              with a fully connected system is that active TCP
              connections are required between all of the nodes. I
              didn't think that would scale and would be brittle against
              floods (intentional or otherwise).</div>
            <div><br>
            </div>
            <div>IMHO, I always felt the right solution for large fanout
              casts was to use multicast. When the driver was written,
              Neutron didn't exist and there was no use-case for large
              fanout casts, so I didn't implement multicast, but knew it
              as an option if it became necessary. It isn't the right
              solution for everyone, of course.</div>
            <div><br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    Using multicast will add some complexity of switch forwarding plane
    that it will enable and maintain multicast group communication. For
    large deployment scenario, I prefer to make forwarding simple and
    easy-to-maintain. IMO, run a set of fanout-router processes in the
    cluster can also achieve the goal.<br>
    The data path is: openstack-daemon --------send the message (with
    fanout=true) ---------> fanout-router -----read the
    matchmaker------> send to the destinations<br>
    Actually it just uses unicast to simulate multicast.<br>
    <blockquote
cite="mid:CAAZDpLe0cEQrje4P5Ow6DF+YtX8nh5jBMmta4L-X4sNEOq9tZA@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div>For connection reuse, you could manage a pool of
              connections and keep those connections around for a
              configurable amount of time, after which they'd expire and
              be re-opened. This would keep the most actively used
              connections alive. One problem is that it would make the
              service more brittle by making it far more susceptible to
              running out of file descriptors by keeping connections
              around significantly longer. However, this wouldn't be as
              brittle as fully-connecting the nodes nor as poorly
              scalable.</div>
            <div><br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    +1. Set a large number of fds is not a problem. Because we use
    socket pool, we can control and keep the fixed number of fds.<br>
    <blockquote
cite="mid:CAAZDpLe0cEQrje4P5Ow6DF+YtX8nh5jBMmta4L-X4sNEOq9tZA@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div>If OpenStack and oslo.messaging were designed
              specifically around this message pattern, I might suggest
              that the library and its applications be aware of
              high-traffic topics and persist the connections for those
              topics, while keeping others ephemeral. A good example for
              Nova would be api->scheduler traffic would be
              persistent, whereas scheduler->compute_node would be
              ephemeral.  Perhaps this is something that could still be
              added to the library.</div>
            <div><br>
            </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">2)
              PUSH/PULL tcp sockets - Pieter suggested we look at
              ROUTER/DEALER<br>
              as an option once 1) is resolved - this socket type
              pairing has some<br>
              interesting features which would help with resilience and
              availability<br>
              including heartbeating. </blockquote>
            <div><br>
            </div>
            <div>Using PUSH/PULL does not eliminate the possibility of
              being fully connected, nor is it incompatible with
              persistent connections. If you're not going to be
              fully-connected, there isn't much advantage to long-lived
              persistent connections and without those persistent
              connections, you're not benefitting from features such as
              heartbeating.</div>
            <div><br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    How about REQ/REP? I think it is appropriate for long-lived
    persistent connections and also provide reliability due to reply.<br>
    <blockquote
cite="mid:CAAZDpLe0cEQrje4P5Ow6DF+YtX8nh5jBMmta4L-X4sNEOq9tZA@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div>I'm not saying ROUTER/DEALER cannot be used, but use
              them with care. They're designed for long-lived channels
              between hosts and not for the ephemeral-type connections
              used in a peer-to-peer system. Dealing with how to manage
              timeouts on the client and the server and the swelling
              number of active file descriptions that you'll get by
              using ROUTER/DEALER is not trivial, assuming you can get
              past the management of all of those synchronous sockets
              (hidden away by tons of eventlet greenthreads)...</div>
            <div><br>
            </div>
            <div>Extra anecdote: During a conversation at the OpenStack
              summit, someone told me about their experiences using
              ZeroMQ and the pain of using REQ/REP sockets and how they
              felt it was a mistake they used them. We discussed a bit
              about some other problems such as the fact it's impossible
              to avoid TCP fragmentation unless you force all frames to
              552 bytes or have a well-managed network where you know
              the MTUs of all the devices you'll pass through.
              Suggestions were made to make ZeroMQ better, until we
              realized we had just described TCP-over-ZeroMQ-over-TCP,
              finished our beers, and quickly changed topics.<br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    Well, seems I need to take my last question back. In our deployment,
    I always take advantage of jumbo frame to increase throughput. You
    said that REQ/REP would introduce TCP fragmentation unless zeromq
    frames == 552 bytes? Could you please elaborate?<br>
    <blockquote
cite="mid:CAAZDpLe0cEQrje4P5Ow6DF+YtX8nh5jBMmta4L-X4sNEOq9tZA@mail.gmail.com"
      type="cite">
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
OpenStack-dev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a>
<a class="moz-txt-link-freetext" href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>