[openstack-dev] [TripleO] Tis the season...for a cloud reboot
aschultz at redhat.com
Tue Dec 19 17:34:55 UTC 2017
On Tue, Dec 19, 2017 at 9:53 AM, Ben Nemec <openstack at nemebean.com> wrote:
> The reboot is done (mostly...see below).
> On 12/18/2017 05:11 PM, Joe Talerico wrote:
>> Ben - Can you provide some links to the ovs port exhaustion issue for
>> some background?
> I don't know if we ever had a bug opened, but there's some discussion of it
> I've also copied Derek since I believe he was the one who found it
> The gist is that after about 3 months of tripleo-ci running in this cloud we
> start to hit errors creating instances because of problems creating OVS
> ports on the compute nodes. Sometimes we see a huge number of ports in
> general, other times we see a lot of ports that look like this:
> Port "qvod2cade14-7c"
> tag: 4095
> Interface "qvod2cade14-7c"
> Notably they all have a tag of 4095, which seems suspicious to me. I don't
> know whether it's actually an issue though.
> I've had some offline discussions about getting someone on this cloud to
> debug the problem. Originally we decided not to pursue it since it's not
> hard to work around and we didn't want to disrupt the environment by trying
> to move to later OpenStack code (we're still back on Mitaka), but it was
> pointed out to me this time around that from a downstream perspective we
> have users on older code as well and it may be worth debugging to make sure
> they don't hit similar problems.
> To that end, I've left one compute node un-rebooted for debugging purposes.
> The downstream discussion is ongoing, but I'll update here if we find
I just so happened to wander across the bug from last time,
>> On Mon, Dec 18, 2017 at 10:43 AM, Ben Nemec <openstack at nemebean.com>
>>> It's that magical time again. You know the one, when we reboot rh1 to
>>> OVS port exhaustion. :-)
>>> If all goes well you won't even notice that this is happening, but there
>>> the possibility that a few jobs will fail while the te-broker host is
>>> rebooted so I wanted to let everyone know. If you notice anything else
>>> hosted in rh1 is down (tripleo.org, zuul-status, etc.) let me know. I
>>> been known to forget to restart services after the reboot.
>>> I'll send a followup when I'm done.
>>> OpenStack Development Mailing List (not for usage questions)
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev