[openstack-dev] [TripleO] Tis the season...for a cloud reboot
Brian Haley
haleyb.dev at gmail.com
Tue Dec 19 22:23:16 UTC 2017
On 12/19/2017 04:00 PM, Ben Nemec wrote:
>
>
> On 12/19/2017 02:43 PM, Brian Haley wrote:
>> On 12/19/2017 11:53 AM, Ben Nemec wrote:
>>> The reboot is done (mostly...see below).
>>>
>>> On 12/18/2017 05:11 PM, Joe Talerico wrote:
>>>> Ben - Can you provide some links to the ovs port exhaustion issue for
>>>> some background?
>>>
>>> I don't know if we ever had a bug opened, but there's some discussion
>>> of it in
>>> http://lists.openstack.org/pipermail/openstack-dev/2016-December/109182.html
>>> I've also copied Derek since I believe he was the one who found it
>>> originally.
>>>
>>> The gist is that after about 3 months of tripleo-ci running in this
>>> cloud we start to hit errors creating instances because of problems
>>> creating OVS ports on the compute nodes. Sometimes we see a huge
>>> number of ports in general, other times we see a lot of ports that
>>> look like this:
>>>
>>> Port "qvod2cade14-7c"
>>> tag: 4095
>>> Interface "qvod2cade14-7c"
>>>
>>> Notably they all have a tag of 4095, which seems suspicious to me. I
>>> don't know whether it's actually an issue though.
>>
>> Tag 4095 is for "dead" OVS ports, it's an unused VLAN tag in the agent.
>>
>> The 'qvo' here shows it's part of the VETH pair that os-vif created
>> when it plugged in the VM (the other half is 'qvb'), and they're
>> created so that iptables rules can be applied by neutron. It's part
>> of the "old" way to do security groups with the
>> OVSHybridIptablesFirewallDriver, and can eventually go away once the
>> OVSFirewallDriver can be used everywhere (requires newer OVS and agent).
>>
>> I wonder if you can run the ovs_cleanup utility to clean some of these
>> up?
>
> As in neutron-ovs-cleanup? Doesn't that wipe out everything, including
> any ports that are still in use? Or is there a different tool I'm not
> aware of that can do more targeted cleanup?
Crap, I thought there was an option to just cleanup these dead devices,
I should have read the code, it's either neutron ports (default) or all
ports. Maybe that should be an option.
-Brian
> Oh, also worth noting that I don't think we have os-vif in this cloud
> because it's so old. There's no os-vif package installed anyway.
>
>>
>> -Brian
>>
>>> I've had some offline discussions about getting someone on this cloud
>>> to debug the problem. Originally we decided not to pursue it since
>>> it's not hard to work around and we didn't want to disrupt the
>>> environment by trying to move to later OpenStack code (we're still
>>> back on Mitaka), but it was pointed out to me this time around that
>>> from a downstream perspective we have users on older code as well and
>>> it may be worth debugging to make sure they don't hit similar problems.
>>>
>>> To that end, I've left one compute node un-rebooted for debugging
>>> purposes. The downstream discussion is ongoing, but I'll update here
>>> if we find anything.
>>>
>>>>
>>>> Thanks,
>>>> Joe
>>>>
>>>> On Mon, Dec 18, 2017 at 10:43 AM, Ben Nemec <openstack at nemebean.com>
>>>> wrote:
>>>>> Hi,
>>>>>
>>>>> It's that magical time again. You know the one, when we reboot rh1
>>>>> to avoid
>>>>> OVS port exhaustion. :-)
>>>>>
>>>>> If all goes well you won't even notice that this is happening, but
>>>>> there is
>>>>> the possibility that a few jobs will fail while the te-broker host is
>>>>> rebooted so I wanted to let everyone know. If you notice anything
>>>>> else
>>>>> hosted in rh1 is down (tripleo.org, zuul-status, etc.) let me know.
>>>>> I have
>>>>> been known to forget to restart services after the reboot.
>>>>>
>>>>> I'll send a followup when I'm done.
>>>>>
>>>>> -Ben
>>>>>
>>>>> __________________________________________________________________________
>>>>>
>>>>> OpenStack Development Mailing List (not for usage questions)
>>>>> Unsubscribe:
>>>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>> __________________________________________________________________________
>>>>
>>>> OpenStack Development Mailing List (not for usage questions)
>>>> Unsubscribe:
>>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>
>>> __________________________________________________________________________
>>>
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>> __________________________________________________________________________
>>
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list