<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">On 01/28/2015 12:51 PM, Kevin Benton

      wrote:<br>

    </div>

    <blockquote

cite="mid:CAO_F6JMXCsGSfFeB1F3=z29N5Fa2oXJZTkGWtXc5Dc11pfUVQQ@mail.gmail.com"

      type="cite">

      <p dir="ltr">If we are going to ignore the IP address changing

        use-case, can we just make the default infinity? Then nobody

        ever has to worry about control plane outages for existing

        client. 24 hours is way too long to be useful anyway. </p>

    </blockquote>

    <br>

    Why would users want to change an active port's IP address anyway? 

    I can see possible use in changing an inactive port's IP address,

    but that wouldn't cause the dhcp issues mentioned here.  I worry

    about setting a default config value to handle a very unusual use

    case.<br>

    <br>

    Chuck<br>

    <br>

    <br>

    <blockquote

cite="mid:CAO_F6JMXCsGSfFeB1F3=z29N5Fa2oXJZTkGWtXc5Dc11pfUVQQ@mail.gmail.com"

      type="cite">

      <div class="gmail_quote">On Jan 28, 2015 12:44 PM, "Salvatore

        Orlando" <<a moz-do-not-send="true"

          href="mailto:sorlando@nicira.com">sorlando@nicira.com</a>>

        wrote:<br type="attribution">

        <blockquote class="gmail_quote" style="margin:0 0 0

          .8ex;border-left:1px #ccc solid;padding-left:1ex">

          <div dir="ltr"><br>

            <div class="gmail_extra"><br>

              <div class="gmail_quote">On 28 January 2015 at 20:19,

                Brian Haley <span dir="ltr"><<a

                    moz-do-not-send="true"

                    href="mailto:brian.haley@hp.com" target="_blank">brian.haley@hp.com</a>></span>

                wrote:<br>

                <blockquote class="gmail_quote" style="margin:0px 0px

                  0px

0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Hi

                  Kevin,<br>

                  <span><br>

                    On 01/28/2015 03:50 AM, Kevin Benton wrote:<br>

                    > Hi,<br>

                    ><br>

                    > Approximately a year and a half ago, the

                    default DHCP lease time in Neutron was<br>

                    > increased from 120 seconds to 86400 seconds.[1]

                    This was done with the goal of<br>

                    > reducing DHCP traffic with very little

                    discussion (based on what I can see in<br>

                    > the review and bug report). While it it does

                    indeed reduce DHCP traffic, I don't<br>

                    > think any bug reports were filed showing that a

                    120 second lease time resulted<br>

                    > in too much traffic or that a jump all of the

                    way to 86400 seconds was required<br>

                    > instead of a value in the same order of

                    magnitude.<br>

                    ><br>

                    > Why does this matter?<br>

                    ><br>

                    > Neutron ports can be updated with a new IP

                    address from the same subnet or<br>

                    > another subnet on the same network. The port

                    update will result in anti-spoofing<br>

                    > iptables rule changes that immediately stop the

                    old IP address from working on<br>

                    > the host. This means the host is unreachable

                    for 0-12 hours based on the current<br>

                    > default lease time without manual

                    intervention[2] (assuming half-lease length<br>

                    > DHCP renewal attempts).<br>

                    <br>

                  </span>So I'll first comment on the problem.  You're

                  essentially "pulling the rug" out<br>

                  from under these VMs by changing their IP (and that of

                  their router and DHCP/DNS<br>

                  server), but you expect they should fail quickly and

                  come right back online.  In<br>

                  a non-Neutron environment wouldn't the IT person that

                  did this need some pretty<br>

                  good heat-resistant pants for all the flames from

                  pissed-off users?  Sure, the<br>

                  guy on his laptop will just bounce the connection, but

                  servers (aka VMs) should<br>

                  stay pretty static.  VMs are servers (and cows

                  according to some).<br>

                </blockquote>

                <div><br>

                </div>

                <div>I actually expect this kind operation to not be one

                  Neutron users will do very often, mostly because

                  regardless of whether you're in the cloud or not,

                  you'd still need to wear those heat resistant pants.</div>

                <div> </div>

                <blockquote class="gmail_quote" style="margin:0px 0px

                  0px

0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><br>

                  The correct solution is to be able to renumber the

                  network so there is no issue<br>

                  with the anti-spoofing rules dropping packets, or the

                  VMs having an unreachable<br>

                  IP address, but that's a much bigger nut to crack.<br>

                </blockquote>

                <div><br>

                </div>

                <div>Indeed. In my opinion the "update IP" operation

                  sets false expectations in users. I have considered

                  disallowing PUT on fixed_ips in the past but that did

                  not go ahead because there were users leveraging it.</div>

                <div> </div>

                <blockquote class="gmail_quote" style="margin:0px 0px

                  0px

0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span><br>

                    > Why is this on the mailing list?<br>

                    ><br>

                    > In an attempt to make the VMs usable in a much

                    shorter timeframe following a<br>

                    > Neutron port address change, I submitted a

                    patch to reduce the default DHCP<br>

                    > lease time to 8 minutes.[3] However, this was

                    upsetting to several people,[4] so<br>

                    > it was suggested I bring this discussion to the

                    mailing list. The following are<br>

                    > the high-level concerns followed by my

                    responses:<br>

                    ><br>

                  </span>>   * 8 minutes is arbitrary<br>

                  >       o Yes, but it's no more arbitrary than 1440

                  minutes. I picked it as an<br>

                  <span>>         interval because it is still 4

                    times larger than the last short value,<br>

                    >         but it still allows VMs to regain

                    connectivity in <5 minutes in the<br>

                    >         event their IP is changed. If someone

                    has a good suggestion for another<br>

                    >         interval based on known dnsmasq QPS

                    limits or some other quantitative<br>

                    >         reason, please chime in here.<br>

                    <br>

                  </span>We run 48 hours as the default in our public

                  cloud, and I did some digging to<br>

                  remind myself of the multiple reasons:<br>

                  <br>

                  1. Too much DHCP traffic.  Sure, only that initial

                  request is broadcast, but<br>

                  dnsmasq is very verbose and loves writing to syslog

                  for everything it does -<br>

                  less is more.  Do a scale test with 10K VMs and you'll

                  quickly find out a large<br>

                  portion of traffic is DHCP RENEWs, and syslog is huge.<br>

                </blockquote>

                <div><br>

                </div>

                <div>This is correct, and something I overlooked in my

                  previous post. Nevertheless I still think that it is

                  really impossible to find an optimal default which is

                  regarded as such by every user. The current default

                  has been chosen mostly for the reason you explain

                  below, and I don't see a strong reason for changing

                  it.</div>

                <div>  <br>

                </div>

                <blockquote class="gmail_quote" style="margin:0px 0px

                  0px

0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><br>

                  2. During a control-plane upgrade or outage, having a

                  short DHCP lease time will<br>

                  take all your VMs offline.  The old value of 2 minutes

                  is not a realistic value<br>

                  for an upgrade, and I don't think 8 minutes is much

                  better.  Yes, when DHCP is<br>

                  down you can't boot a new VM, but as long as customers

                  can get to their existing<br>

                  VMs they're pretty happy and won't scream bloody

                  murder.<br>

                </blockquote>

                <div><br>

                </div>

                <div>In our cloud we were continuously hit bit this. We

                  could not take our dhcp agents out, otherwise all VMs

                  would lose their leases, unless the downtime of the

                  agent was very brief.  </div>

                <div><br>

                </div>

                <blockquote class="gmail_quote" style="margin:0px 0px

                  0px

0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><br>

                  There's probably more, but those were the top two,

                  with #2 being most important.<br>

                </blockquote>

                <div><br>

                </div>

                <div>Summarizing, I think that Kevin is exposing a real,

                  albeit well-know problem (sorry about my dhcp release

                  faux pas - I can use jet lag as a justification!), and

                  he's proposing a mitigation to it. On the other hand,

                  this mitigation, as Brian explains, is going to cause

                  real operational issues. Still, we're arguing on the a

                  default value for a configuration parameter. I

                  therefore think the best thing that we can do is

                  explicitly stating what happens when setting long or

                  short lease times.</div>

                <div>I expected this to be documented in [1], but it's

                  not. I think that place and neutron.conf might contain

                  this kind of documentation, such as:</div>

                <div><br>

                </div>

                <div>

                  <div><font face="monospace, monospace"># DHCP Lease

                      duration (in seconds). </font></div>

                  <div><font face="monospace, monospace"># Use<span

                        style="white-space:pre-wrap"> -1 to</span> tell

                      dnsmasq to use infinite lease times. <span

                        style="white-space:pre-wrap"> </span></font></div>

                  <div><font face="monospace, monospace">#

                      dhcp_lease_duration = 86400</font></div>

                </div>

                <div><font face="monospace, monospace"># Note that long

                    DHCP leases will result in delays</font></div>

                <div><font face="monospace, monospace"># in instances

                    acquiring updated IP addresses. This</font></div>

                <div><font face="monospace, monospace"># may result in

                    downtime for those instance as anti</font></div>

                <div><font face="monospace, monospace"># spoof policy

                    will then block all traffic in and out of</font></div>

                <div><font face="monospace, monospace"># them. In order

                    to minimise this downtime window</font></div>

                <div><font face="monospace, monospace"># the lease time

                    should be shorter, for example</font></div>

                <div><font face="monospace, monospace">#

                    dhcp_lease_duration = 480</font><br>

                </div>

                <div><br>

                </div>

                <div>However, I would not change the current system

                  default, as this might affect operational systems.</div>

                <div><br>

                </div>

                <div>Apologies again for my stupid dhcp-release note,</div>

                <div>Salvatore</div>

                <div><br>

                </div>

                <div>[1] <a moz-do-not-send="true"

                    href="http://developer.openstack.org/api-ref-networking-v2.html"

                    target="_blank">http://developer.openstack.org/api-ref-networking-v2.html</a></div>

                <div> </div>

                <blockquote class="gmail_quote" style="margin:0px 0px

                  0px

0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><br>

                  >   * other datacenters use long lease times<br>

                  >       o This is true, but it's not really a valid

                  comparison. In most regular<br>

                  <span>>         datacenters, updating a static DHCP

                    lease has no effect on the data<br>

                    >         plane so it doesn't matter that the

                    client doesn't react for hours/days<br>

                    >         (even with DHCP snooping enabled).

                    However, in Neutron's case, the<br>

                    >         security groups are immediately updated

                    so all traffic using the old<br>

                    >         address is blocked.<br>

                    <br>

                  </span>Yes, and choosing the lease time is a

                  deployment decision that needs to take a<br>

                  lot of things into account.  Like I said, we don't

                  even use the default.  The<br>

                  default should just be a good guess for a standard

                  deployment, not a value that<br>

                  caters towards the edge cases, especially when the

                  value is tunable in neutron.conf.<br>

                  <br>

                  >   * dhcp traffic is scary because it's broadcast<br>

                  >       o ARP traffic is also broadcast and many

                  clients will expire entries every<br>

                  <span>>         5-10 minutes and re-ARP.

                    L2population may be used to prevent ARP<br>

                    >         propagation, so the comparison between

                    DHCP and ARP isn't always<br>

                    >         relevant here.<br>

                    <br>

                  </span>I don't recall anyone being scared of

                  broadcast, and can't find any comments<br>

                  regarding it in <a moz-do-not-send="true"

                    href="https://review.openstack.org/#/c/150595/"

                    target="_blank">https://review.openstack.org/#/c/150595/</a><br>

                  <span><br>

                    > Please reply back with your

                    opinions/anecdotes/data related to short DHCP lease<br>

                    > times.<br>

                    <br>

                  </span>I can only speculate on why 24 hours was chosen

                  as the default back in 2013,<br>

                  possibly because a lot of wireless router firmware

                  defaults are set as such?<br>

                  <span><br>

                    > 1. <a moz-do-not-send="true"

href="https://github.com/openstack/neutron/commit/d9832282cf656b162c51afdefb830dacab72defe"

                      target="_blank">https://github.com/openstack/neutron/commit/d9832282cf656b162c51afdefb830dacab72defe</a><br>

                    > 2. Manual intervention could be an instance

                    reboot, a dhcp client invocation via<br>

                    > the console, or a delayed invocation right

                    before the update. (all significantly<br>

                    > more difficult to script than a simple update

                    of a port's IP via the API).<br>

                    > 3. <a moz-do-not-send="true"

                      href="https://review.openstack.org/#/c/150595/"

                      target="_blank">https://review.openstack.org/#/c/150595/</a><br>

                    > 4. <a moz-do-not-send="true"

                      href="http://i.imgur.com/xtvatkP.jpg"

                      target="_blank">http://i.imgur.com/xtvatkP.jpg</a><br>

                    <br>

                  </span>I was a much bigger baby than that :)<br>

                  <span><font color="#888888"><br>

                      -Brian<br>

                    </font></span>

                  <div>

                    <div><br>

__________________________________________________________________________<br>

                      OpenStack Development Mailing List (not for usage

                      questions)<br>

                      Unsubscribe: <a moz-do-not-send="true"

href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe"

                        target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

                      <a moz-do-not-send="true"

                        href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev"

                        target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

                    </div>

                  </div>

                </blockquote>

              </div>

              <br>

            </div>

          </div>

          <br>

__________________________________________________________________________<br>

          OpenStack Development Mailing List (not for usage questions)<br>

          Unsubscribe: <a moz-do-not-send="true"

href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe"

            target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

          <a moz-do-not-send="true"

            href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev"

            target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

          <br>

        </blockquote>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">__________________________________________________________________________

OpenStack Development Mailing List (not for usage questions)

Unsubscribe: <a class="moz-txt-link-abbreviated" href="mailto:OpenStack-dev-request@lists.openstack.org?subject:unsubscribe">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a>

<a class="moz-txt-link-freetext" href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>