[openstack-dev] [TripleO] Tis the season...for a cloud reboot
Ben Nemec
openstack at nemebean.com
Tue Dec 19 21:00:08 UTC 2017
On 12/19/2017 02:43 PM, Brian Haley wrote:
> On 12/19/2017 11:53 AM, Ben Nemec wrote:
>> The reboot is done (mostly...see below).
>>
>> On 12/18/2017 05:11 PM, Joe Talerico wrote:
>>> Ben - Can you provide some links to the ovs port exhaustion issue for
>>> some background?
>>
>> I don't know if we ever had a bug opened, but there's some discussion
>> of it in
>> http://lists.openstack.org/pipermail/openstack-dev/2016-December/109182.html
>> I've also copied Derek since I believe he was the one who found it
>> originally.
>>
>> The gist is that after about 3 months of tripleo-ci running in this
>> cloud we start to hit errors creating instances because of problems
>> creating OVS ports on the compute nodes. Sometimes we see a huge
>> number of ports in general, other times we see a lot of ports that
>> look like this:
>>
>> Port "qvod2cade14-7c"
>> tag: 4095
>> Interface "qvod2cade14-7c"
>>
>> Notably they all have a tag of 4095, which seems suspicious to me. I
>> don't know whether it's actually an issue though.
>
> Tag 4095 is for "dead" OVS ports, it's an unused VLAN tag in the agent.
>
> The 'qvo' here shows it's part of the VETH pair that os-vif created when
> it plugged in the VM (the other half is 'qvb'), and they're created so
> that iptables rules can be applied by neutron. It's part of the "old"
> way to do security groups with the OVSHybridIptablesFirewallDriver, and
> can eventually go away once the OVSFirewallDriver can be used everywhere
> (requires newer OVS and agent).
>
> I wonder if you can run the ovs_cleanup utility to clean some of these up?
As in neutron-ovs-cleanup? Doesn't that wipe out everything, including
any ports that are still in use? Or is there a different tool I'm not
aware of that can do more targeted cleanup?
Oh, also worth noting that I don't think we have os-vif in this cloud
because it's so old. There's no os-vif package installed anyway.
>
> -Brian
>
>> I've had some offline discussions about getting someone on this cloud
>> to debug the problem. Originally we decided not to pursue it since
>> it's not hard to work around and we didn't want to disrupt the
>> environment by trying to move to later OpenStack code (we're still
>> back on Mitaka), but it was pointed out to me this time around that
>> from a downstream perspective we have users on older code as well and
>> it may be worth debugging to make sure they don't hit similar problems.
>>
>> To that end, I've left one compute node un-rebooted for debugging
>> purposes. The downstream discussion is ongoing, but I'll update here
>> if we find anything.
>>
>>>
>>> Thanks,
>>> Joe
>>>
>>> On Mon, Dec 18, 2017 at 10:43 AM, Ben Nemec <openstack at nemebean.com>
>>> wrote:
>>>> Hi,
>>>>
>>>> It's that magical time again. You know the one, when we reboot rh1
>>>> to avoid
>>>> OVS port exhaustion. :-)
>>>>
>>>> If all goes well you won't even notice that this is happening, but
>>>> there is
>>>> the possibility that a few jobs will fail while the te-broker host is
>>>> rebooted so I wanted to let everyone know. If you notice anything else
>>>> hosted in rh1 is down (tripleo.org, zuul-status, etc.) let me know.
>>>> I have
>>>> been known to forget to restart services after the reboot.
>>>>
>>>> I'll send a followup when I'm done.
>>>>
>>>> -Ben
>>>>
>>>> __________________________________________________________________________
>>>>
>>>> OpenStack Development Mailing List (not for usage questions)
>>>> Unsubscribe:
>>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>> __________________________________________________________________________
>>>
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>> __________________________________________________________________________
>>
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list