[Openstack-operators] Neutron getting stuck creating namespaces
jbajin at verisign.com
Tue Nov 24 13:26:13 UTC 2015
We haven’t seen the bad namespaces issue, but we have experienced an issue where our node eventually started to see soft lockups like these:
kernel: BUG: soft lockup - CPU#0 stuck for 22s!
We noticed it once we hit a high amount of namespaces. It was definitely over 400, as we didn’t realize that the option to delete namespaces was reverted from true to false a few releases ago. We cleaned up the namespaces and those errors would stop showing up, then eventually over time those namespaces rose again to a high level, and this time we were lucky to have the soft lockup not on the neutron process, but on the kernel scheduler. That is where our reboot happened as the system realized that it was dead and restarted it.
On 11/24/15, 4:14 AM, "Saverio Proto" <zioproto at gmail.com> wrote:
>we also had problems with namespaces in Juno. Maybe a little different
>than what you describe.
>we are running about 250 namespaces in our network node. When we
>reboot the network node we observe that some namespaces have qr-* and
>qg-* interfaces missing.
>we believe that is because the control plane in neutron juno performs
>very badly. This is probably fixed in Kilo.
>to work around it, after the network node is up and running, we do
>reset the namespaces that have interfaces missing:
> neutron router-update <UUID> --admin-state-up false
> sleep 5
> neutron router-update <UUID> --admin-state-up true
>2015-11-24 9:51 GMT+01:00 Xav Paice <xavpaice at gmail.com>:
>> Neutron is Juno, on Trusty boxes with the 3.19 LTS kernel. We're in the
>> process of updating to Kilo, and onwards to Liberty.
>> On 24 November 2015 at 21:24, Saverio Proto <zioproto at gmail.com> wrote:
>>> Hello Xav,
>>> what version of Openstack are you running ?
>>> thank you
>>> 2015-11-23 20:04 GMT+01:00 Xav Paice <xavpaice at gmail.com>:
>>> > Hi,
>>> > Over the last few months we've had a few incidents where the process to
>>> > create network namespaces (Neutron, OVS) on the network nodes gets
>>> > 'stuck'
>>> > and prevents not only the router it's trying to create from finishing,
>>> > but
>>> > all further namespace operations too.
>>> > This has usually finished up with either us rebooting the node pretty
>>> > fast
>>> > afterwards, or the node rebooting itself.
>>> > It looks very much like we're affected by
>>> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1403152 but the
>>> > notes
>>> > say it's fixed in the kernel we're running. I've asked the clever
>>> > person
>>> > who checked it to make some extra notes in the bug report.
>>> > It looks very much like when we have a bunch of load on the box the
>>> > thing is
>>> > more likely to trigger - I was wondering if other ops have a max ratio
>>> > of
>>> > routers per network node? I would have thought our current max of 150
>>> > routers per node would be pretty light, but with the dhcp namespaces as
>>> > well
>>> > that's ~450 namespaces on a box and maybe that's an issue?
>>> > Thanks
>>> > _______________________________________________
>>> > OpenStack-operators mailing list
>>> > OpenStack-operators at lists.openstack.org
>>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>OpenStack-operators mailing list
>OpenStack-operators at lists.openstack.org
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 5296 bytes
Desc: not available
More information about the OpenStack-operators