[openstack-dev] [TripleO] Tis the season...for a cloud reboot

Ben Nemec openstack at nemebean.com
Tue Dec 19 16:53:05 UTC 2017


The reboot is done (mostly...see below).

On 12/18/2017 05:11 PM, Joe Talerico wrote:
> Ben - Can you provide some links to the ovs port exhaustion issue for
> some background?

I don't know if we ever had a bug opened, but there's some discussion of 
it in 
http://lists.openstack.org/pipermail/openstack-dev/2016-December/109182.html 
  I've also copied Derek since I believe he was the one who found it 
originally.

The gist is that after about 3 months of tripleo-ci running in this 
cloud we start to hit errors creating instances because of problems 
creating OVS ports on the compute nodes.  Sometimes we see a huge number 
of ports in general, other times we see a lot of ports that look like this:

Port "qvod2cade14-7c"
             tag: 4095
             Interface "qvod2cade14-7c"

Notably they all have a tag of 4095, which seems suspicious to me.  I 
don't know whether it's actually an issue though.

I've had some offline discussions about getting someone on this cloud to 
debug the problem.  Originally we decided not to pursue it since it's 
not hard to work around and we didn't want to disrupt the environment by 
trying to move to later OpenStack code (we're still back on Mitaka), but 
it was pointed out to me this time around that from a downstream 
perspective we have users on older code as well and it may be worth 
debugging to make sure they don't hit similar problems.

To that end, I've left one compute node un-rebooted for debugging 
purposes.  The downstream discussion is ongoing, but I'll update here if 
we find anything.

> 
> Thanks,
> Joe
> 
> On Mon, Dec 18, 2017 at 10:43 AM, Ben Nemec <openstack at nemebean.com> wrote:
>> Hi,
>>
>> It's that magical time again.  You know the one, when we reboot rh1 to avoid
>> OVS port exhaustion. :-)
>>
>> If all goes well you won't even notice that this is happening, but there is
>> the possibility that a few jobs will fail while the te-broker host is
>> rebooted so I wanted to let everyone know.  If you notice anything else
>> hosted in rh1 is down (tripleo.org, zuul-status, etc.) let me know.  I have
>> been known to forget to restart services after the reboot.
>>
>> I'll send a followup when I'm done.
>>
>> -Ben
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list