[openstack-dev] [TripleO] Tis the season...for a cloud reboot

Alex Schultz aschultz at redhat.com
Tue Dec 19 17:34:55 UTC 2017


On Tue, Dec 19, 2017 at 9:53 AM, Ben Nemec <openstack at nemebean.com> wrote:
> The reboot is done (mostly...see below).
>
> On 12/18/2017 05:11 PM, Joe Talerico wrote:
>>
>> Ben - Can you provide some links to the ovs port exhaustion issue for
>> some background?
>
>
> I don't know if we ever had a bug opened, but there's some discussion of it
> in
> http://lists.openstack.org/pipermail/openstack-dev/2016-December/109182.html
> I've also copied Derek since I believe he was the one who found it
> originally.
>
> The gist is that after about 3 months of tripleo-ci running in this cloud we
> start to hit errors creating instances because of problems creating OVS
> ports on the compute nodes.  Sometimes we see a huge number of ports in
> general, other times we see a lot of ports that look like this:
>
> Port "qvod2cade14-7c"
>             tag: 4095
>             Interface "qvod2cade14-7c"
>
> Notably they all have a tag of 4095, which seems suspicious to me.  I don't
> know whether it's actually an issue though.
>
> I've had some offline discussions about getting someone on this cloud to
> debug the problem.  Originally we decided not to pursue it since it's not
> hard to work around and we didn't want to disrupt the environment by trying
> to move to later OpenStack code (we're still back on Mitaka), but it was
> pointed out to me this time around that from a downstream perspective we
> have users on older code as well and it may be worth debugging to make sure
> they don't hit similar problems.
>
> To that end, I've left one compute node un-rebooted for debugging purposes.
> The downstream discussion is ongoing, but I'll update here if we find
> anything.
>

I just so happened to wander across the bug from last time,
https://bugs.launchpad.net/tripleo/+bug/1719334

>
>>
>> Thanks,
>> Joe
>>
>> On Mon, Dec 18, 2017 at 10:43 AM, Ben Nemec <openstack at nemebean.com>
>> wrote:
>>>
>>> Hi,
>>>
>>> It's that magical time again.  You know the one, when we reboot rh1 to
>>> avoid
>>> OVS port exhaustion. :-)
>>>
>>> If all goes well you won't even notice that this is happening, but there
>>> is
>>> the possibility that a few jobs will fail while the te-broker host is
>>> rebooted so I wanted to let everyone know.  If you notice anything else
>>> hosted in rh1 is down (tripleo.org, zuul-status, etc.) let me know.  I
>>> have
>>> been known to forget to restart services after the reboot.
>>>
>>> I'll send a followup when I'm done.
>>>
>>> -Ben
>>>
>>>
>>> __________________________________________________________________________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list