[openstack-dev] [TripleO] Tis the season...for a cloud reboot

Brian Haley haleyb.dev at gmail.com
Tue Dec 19 22:23:16 UTC 2017


On 12/19/2017 04:00 PM, Ben Nemec wrote:
> 
> 
> On 12/19/2017 02:43 PM, Brian Haley wrote:
>> On 12/19/2017 11:53 AM, Ben Nemec wrote:
>>> The reboot is done (mostly...see below).
>>>
>>> On 12/18/2017 05:11 PM, Joe Talerico wrote:
>>>> Ben - Can you provide some links to the ovs port exhaustion issue for
>>>> some background?
>>>
>>> I don't know if we ever had a bug opened, but there's some discussion 
>>> of it in 
>>> http://lists.openstack.org/pipermail/openstack-dev/2016-December/109182.html 
>>>   I've also copied Derek since I believe he was the one who found it 
>>> originally.
>>>
>>> The gist is that after about 3 months of tripleo-ci running in this 
>>> cloud we start to hit errors creating instances because of problems 
>>> creating OVS ports on the compute nodes.  Sometimes we see a huge 
>>> number of ports in general, other times we see a lot of ports that 
>>> look like this:
>>>
>>> Port "qvod2cade14-7c"
>>>              tag: 4095
>>>              Interface "qvod2cade14-7c"
>>>
>>> Notably they all have a tag of 4095, which seems suspicious to me.  I 
>>> don't know whether it's actually an issue though.
>>
>> Tag 4095 is for "dead" OVS ports, it's an unused VLAN tag in the agent.
>>
>> The 'qvo' here shows it's part of the VETH pair that os-vif created 
>> when it plugged in the VM (the other half is 'qvb'), and they're 
>> created so that iptables rules can be applied by neutron.  It's part 
>> of the "old" way to do security groups with the 
>> OVSHybridIptablesFirewallDriver, and can eventually go away once the 
>> OVSFirewallDriver can be used everywhere (requires newer OVS and agent).
>>
>> I wonder if you can run the ovs_cleanup utility to clean some of these 
>> up?
> 
> As in neutron-ovs-cleanup?  Doesn't that wipe out everything, including 
> any ports that are still in use?  Or is there a different tool I'm not 
> aware of that can do more targeted cleanup?

Crap, I thought there was an option to just cleanup these dead devices, 
I should have read the code, it's either neutron ports (default) or all 
ports.  Maybe that should be an option.

-Brian

> Oh, also worth noting that I don't think we have os-vif in this cloud 
> because it's so old.  There's no os-vif package installed anyway.
> 
>>
>> -Brian
>>
>>> I've had some offline discussions about getting someone on this cloud 
>>> to debug the problem.  Originally we decided not to pursue it since 
>>> it's not hard to work around and we didn't want to disrupt the 
>>> environment by trying to move to later OpenStack code (we're still 
>>> back on Mitaka), but it was pointed out to me this time around that 
>>> from a downstream perspective we have users on older code as well and 
>>> it may be worth debugging to make sure they don't hit similar problems.
>>>
>>> To that end, I've left one compute node un-rebooted for debugging 
>>> purposes.  The downstream discussion is ongoing, but I'll update here 
>>> if we find anything.
>>>
>>>>
>>>> Thanks,
>>>> Joe
>>>>
>>>> On Mon, Dec 18, 2017 at 10:43 AM, Ben Nemec <openstack at nemebean.com> 
>>>> wrote:
>>>>> Hi,
>>>>>
>>>>> It's that magical time again.  You know the one, when we reboot rh1 
>>>>> to avoid
>>>>> OVS port exhaustion. :-)
>>>>>
>>>>> If all goes well you won't even notice that this is happening, but 
>>>>> there is
>>>>> the possibility that a few jobs will fail while the te-broker host is
>>>>> rebooted so I wanted to let everyone know.  If you notice anything 
>>>>> else
>>>>> hosted in rh1 is down (tripleo.org, zuul-status, etc.) let me know. 
>>>>> I have
>>>>> been known to forget to restart services after the reboot.
>>>>>
>>>>> I'll send a followup when I'm done.
>>>>>
>>>>> -Ben
>>>>>
>>>>> __________________________________________________________________________ 
>>>>>
>>>>> OpenStack Development Mailing List (not for usage questions)
>>>>> Unsubscribe: 
>>>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>> __________________________________________________________________________ 
>>>>
>>>> OpenStack Development Mailing List (not for usage questions)
>>>> Unsubscribe: 
>>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>
>>> __________________________________________________________________________ 
>>>
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: 
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>> __________________________________________________________________________ 
>>
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: 
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list