[Openstack-operators] Neutron crashed hard

Joe Topjian joe at topjian.net
Thu Dec 19 05:28:52 UTC 2013


Thanks for the input. I'm using memcache as a token store already, though.


On Wed, Dec 18, 2013 at 9:37 PM, Erik McCormick
<emccormick at cirrusseven.com>wrote:

> It sounds more to me like your database went awol than a neutron problem.
> Assuming you had done a bit of mucking around testing the cluster before
> this event, is there any chance you're not using memcached and your tokens
> table has grown large? You might want to switch over to memcached for
> Keystone and see if that doesn't make it happier.
> On Dec 18, 2013 9:40 PM, "Joe Topjian" <joe at topjian.net> wrote:
>
>> Hello,
>>
>> I set up an internal OpenStack cloud to give a workshop for around 15
>> people. I decided to use Neutron as I'm trying to get more experience with
>> it. The cloud consisted of a cloud controller and four compute nodes. Very
>> decent Dell hardware, Ubuntu 12.04, Havana 2013.2.0.
>>
>> Neutron was configured with the OVS plugin, non-overlapping IPs, and a
>> single shared subnet. GRE tunnelling was used between compute nodes.
>>
>> Everything was working fine until the 15 people tried launching a CirrOS
>> instance at approximately the same time.
>>
>> Then Neutron crashed.
>>
>> The compute nodes had this in their logs:
>>
>> 2013-12-18 09:52:57.707 28514 TRACE nova.compute.manager
>> ConnectionFailed: Connection to neutron failed: timed out
>>
>> All instances went into an Error state.
>>
>> Restarting the Neutron services did no good. Terminating the Error'd
>> instances seemed to make the problem worse -- the entire cloud became
>> unavailable (meaning, both Horizon and Nova were unusable as they would
>> time out waiting for Neutron).
>>
>> We moved on to a different cloud to continue on with the workshop. I
>> would occasionally issue "neutron net-list" in the original cloud to see if
>> I would get a result. It took about an hour.
>>
>> What happened?
>>
>> I've read about Neutron performance issues -- would this be something
>> along those lines?
>>
>> What's the best way to quickly recover from a situation like this?
>>
>> Since then, I haven't recreated the database, networks, or anything like
>> that. Is there a specific log or database table I can look for to see more
>> information on how exactly this situation happened?
>>
>> Thanks,
>> Joe
>>
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20131218/a0ebbd98/attachment.html>


More information about the OpenStack-operators mailing list