<div dir="ltr">I'm pretty sure I've resolved this issue. Since this seems to happen randomly, it might just be a coincidence that this is by far the longest streak that it hasn't happened. :)<div><br></div><div>I noticed that CentOS 7 and RHEL 7 are setting a `valid_lft` and `preferred_lft` timeout on the IPv4 address. You can see this by doing an "ip a" on CentOS7/RHEL7 and comparing with either CentOS6 or Ubuntu. This is the first time I've seen this used on IPv4. It's usually used for IPv6 privacy addresses. The timeout is set to something larger than the lease renewal time.</div><div><br></div><div>What happens, though, is that it is occasionally taking a little longer to receive the DHCP renewal. Then the `valid_lft` hits zero and the IP is removed from the interface. When this happens, the kernel will clean up any routes used by the removed IP (in this case, the default gateway).</div><div><br></div><div>A few seconds later, the late DHCP renewal is finally received and the IP is added back to the interface. But due to how CentOS/RHEL7 is handling the renewal in /usr/sbin/dhclient-script, the gateway is never re-added. </div><div><br></div><div>My guess as to why a newer version of dnsmasq does not exhibit this issue is because it's advertising renewals a little different: enough to trigger the part of dhclient-script to re-add the gateway. I have not verified this theory, though.</div><div><br></div><div>What I've done for now is modified dhclient-script and removed any portion that sets a valid_lft and preferred_lft, so now they are set to "forever" just like other distros.</div><div><br></div><div>And so far, so good (crossing fingers).</div><div><br></div><div>Thanks,</div><div>Joe</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Jan 27, 2015 at 1:53 PM, Joe Topjian <span dir="ltr"><<a href="mailto:joe@topjian.net" target="_blank">joe@topjian.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Hi George,</div><div><br></div>All instances have only a single interface.<div><br></div><div>Thanks,</div><div>Joe</div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Jan 27, 2015 at 1:38 PM, George Shuklin <span dir="ltr"><<a href="mailto:george.shuklin@gmail.com" target="_blank">george.shuklin@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  
  <div text="#000000" bgcolor="#FFFFFF">

    How many network interfaces have your instance? If more than one -

    check settings for second network (subnet). It can have own dhcp

    settings which may mess up with routes for the main network.<div><div><br>

    <br>

    <div>On 01/27/2015 06:08 PM, Joe Topjian

      wrote:<br>

    </div>

    </div></div><blockquote type="cite"><div><div>

      <div dir="ltr">Hello,

        <div><br>

        </div>

        <div>I have run into two different OpenStack clouds where

          instances running either RHEL 7 or CentOS 7 images are

          randomly losing their network gateway.</div>

        <div><br>

        </div>

        <div>There's nothing in the logs that show any indication of

          why. There's no DHCP hiccup or anything like that. The gateway

          has just disappeared.</div>

        <div><br>

        </div>

        <div>If I log into the instance via another instance (so on the

          same subnet since there's no gateway), I can manually re-add

          the gateway and everything works... until it loses it again.</div>

        <div><br>

        </div>

        <div>One cloud is running Havana and the other is running

          Icehouse. Both are using nova-network and both are Ubuntu

          12.04.</div>

        <div><br>

        </div>

        <div>On the Havana cloud, we decided to install the dnsmasq

          package from Ubuntu 14.04. This looks to have resolved the

          issue as this was back in November and I haven't heard an

          update since.</div>

        <div><br>

        </div>

        <div>However, we don't want to do that just yet on the Icehouse

          cloud. We'd like to understand exactly why this is happening

          and why updating dnsmasq resolves an issue that only one

          specific type of image is having.</div>

        <div><br>

        </div>

        <div>I can make my way around CentOS, but I'm not as familiar

          with it as I am with Ubuntu (especially CentOS 7). Does anyone

          know what change in RHEL7/CentOS7 might be causing this? Or

          does anyone have any other ideas on how to troubleshoot the

          issue? </div>

        <div><br>

        </div>

        <div>I currently have access to two instances in this state, so

          I'd be happy to act as remote hands and eyes. :)</div>

        <div><br>

        </div>

        <div>Thanks,</div>

        <div>Joe</div>

      </div>

      <br>

      <fieldset></fieldset>

      <br>

      </div></div><span><pre>_______________________________________________

OpenStack-operators mailing list

<a href="mailto:OpenStack-operators@lists.openstack.org" target="_blank">OpenStack-operators@lists.openstack.org</a>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators</a>

</pre>

    </span></blockquote>

    <br>

  </div>


<br>_______________________________________________<br>

OpenStack-operators mailing list<br>

<a href="mailto:OpenStack-operators@lists.openstack.org" target="_blank">OpenStack-operators@lists.openstack.org</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators</a><br>

<br></blockquote></div><br></div>

</div></div></blockquote></div><br></div>