[Openstack-operators] Neutron crashed hard

Joe Topjian joe at topjian.net
Thu Dec 19 02:33:35 UTC 2013


I set up an internal OpenStack cloud to give a workshop for around 15
people. I decided to use Neutron as I'm trying to get more experience with
it. The cloud consisted of a cloud controller and four compute nodes. Very
decent Dell hardware, Ubuntu 12.04, Havana 2013.2.0.

Neutron was configured with the OVS plugin, non-overlapping IPs, and a
single shared subnet. GRE tunnelling was used between compute nodes.

Everything was working fine until the 15 people tried launching a CirrOS
instance at approximately the same time.

Then Neutron crashed.

The compute nodes had this in their logs:

2013-12-18 09:52:57.707 28514 TRACE nova.compute.manager ConnectionFailed:
Connection to neutron failed: timed out

All instances went into an Error state.

Restarting the Neutron services did no good. Terminating the Error'd
instances seemed to make the problem worse -- the entire cloud became
unavailable (meaning, both Horizon and Nova were unusable as they would
time out waiting for Neutron).

We moved on to a different cloud to continue on with the workshop. I would
occasionally issue "neutron net-list" in the original cloud to see if I
would get a result. It took about an hour.

What happened?

I've read about Neutron performance issues -- would this be something along
those lines?

What's the best way to quickly recover from a situation like this?

Since then, I haven't recreated the database, networks, or anything like
that. Is there a specific log or database table I can look for to see more
information on how exactly this situation happened?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20131218/cf2f561c/attachment.html>

More information about the OpenStack-operators mailing list