[Openstack-operators] Neutron crashed hard
joe at topjian.net
Thu Dec 19 02:33:35 UTC 2013
I set up an internal OpenStack cloud to give a workshop for around 15
people. I decided to use Neutron as I'm trying to get more experience with
it. The cloud consisted of a cloud controller and four compute nodes. Very
decent Dell hardware, Ubuntu 12.04, Havana 2013.2.0.
Neutron was configured with the OVS plugin, non-overlapping IPs, and a
single shared subnet. GRE tunnelling was used between compute nodes.
Everything was working fine until the 15 people tried launching a CirrOS
instance at approximately the same time.
Then Neutron crashed.
The compute nodes had this in their logs:
2013-12-18 09:52:57.707 28514 TRACE nova.compute.manager ConnectionFailed:
Connection to neutron failed: timed out
All instances went into an Error state.
Restarting the Neutron services did no good. Terminating the Error'd
instances seemed to make the problem worse -- the entire cloud became
unavailable (meaning, both Horizon and Nova were unusable as they would
time out waiting for Neutron).
We moved on to a different cloud to continue on with the workshop. I would
occasionally issue "neutron net-list" in the original cloud to see if I
would get a result. It took about an hour.
I've read about Neutron performance issues -- would this be something along
What's the best way to quickly recover from a situation like this?
Since then, I haven't recreated the database, networks, or anything like
that. Is there a specific log or database table I can look for to see more
information on how exactly this situation happened?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the OpenStack-operators