<div dir="ltr"><div>OpenStack with nova-compute manage vCenter, create an instance of the dashboard interface with vnc open. . Refer to the official documentation nova.conf added:</div><div>compute_driver = vmwareapi.VMwareVCDriver</div>

<div>vmwareapi_host_ip = 200.21.0.99</div><div>vmwareapi_host_username = administrator</div><div>vmwareapi_host_password = root123.</div><div>vmwareapi_cluster_name = openstack_vs</div><div>vmwareapi_wsdl_loc = <a href="http://200.21.4.18:8080/vmware/SDK/vsphere-ws/wsdl/vim25/vimService.wsdl">http://200.21.4.18:8080/vmware/SDK/vsphere-ws/wsdl/vim25/vimService.wsdl</a></div>

<div><br></div><div>nova.conf the VNC configuration options are as follows:<br></div><div><div>vncserver_listen=0.0.0.0</div><div>vnc_enabled=true</div><div>novncproxy_base_url=<a href="http://200.21.4.17:6080/vnc_auto.html">http://200.21.4.17:6080/vnc_auto.html</a></div>

<div>novncproxy_port=6080</div></div><div><br></div><div><br></div><div>And my control node, compute nodes, vCenter hosts are 200.21.4.17, 200.21.4.18, 200.21.0.99<br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">

On Thu, Dec 19, 2013 at 8:00 PM,  <span dir="ltr"><<a href="mailto:openstack-operators-request@lists.openstack.org" target="_blank">openstack-operators-request@lists.openstack.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Send OpenStack-operators mailing list submissions to<br>

        <a href="mailto:openstack-operators@lists.openstack.org">openstack-operators@lists.openstack.org</a><br>

<br>

To subscribe or unsubscribe via the World Wide Web, visit<br>

        <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators</a><br>

<br>

or, via email, send a message with subject or body 'help' to<br>

        <a href="mailto:openstack-operators-request@lists.openstack.org">openstack-operators-request@lists.openstack.org</a><br>

<br>

You can reach the person managing the list at<br>

        <a href="mailto:openstack-operators-owner@lists.openstack.org">openstack-operators-owner@lists.openstack.org</a><br>

<br>

When replying, please edit your Subject line so it is more specific<br>

than "Re: Contents of OpenStack-operators digest..."<br>

<br>

<br>

Today's Topics:<br>

<br>

   1. AUTO: Ophir Solonikov is out of the office        (returning<br>

      22/12/2013) (Ophir Solonikov)<br>

   2. Re: IGNORED qpid_hosts in glance-api.conf (Russell Bryant)<br>

   3. Neutron crashed hard (Joe Topjian)<br>

   4. Re: Neutron crashed hard (Erik McCormick)<br>

   5. Re: Neutron crashed hard (Jay Pipes)<br>

   6. Re: Neutron crashed hard (Joe Topjian)<br>

   7. Re: Neutron crashed hard (Joe Topjian)<br>

   8. Re: Neutron crashed hard (Jay Pipes)<br>

   9. Re: Neutron crashed hard (Joe Topjian)<br>

<br>

<br>

----------------------------------------------------------------------<br>

<br>

Message: 1<br>

Date: Wed, 18 Dec 2013 16:07:03 +0200<br>

From: Ophir Solonikov <<a href="mailto:OPHIRS@il.ibm.com">OPHIRS@il.ibm.com</a>><br>

To: <a href="mailto:openstack-operators@lists.openstack.org">openstack-operators@lists.openstack.org</a><br>

Subject: [Openstack-operators] AUTO: Ophir Solonikov is out of the<br>

        office  (returning 22/12/2013)<br>

Message-ID:<br>

        <<a href="mailto:OFA9F78110.BC21C4C6-ONC2257C45.004D8CE2-C2257C45.004D8CE2@il.ibm.com">OFA9F78110.BC21C4C6-ONC2257C45.004D8CE2-C2257C45.004D8CE2@il.ibm.com</a>><br>

Content-Type: text/plain; charset=US-ASCII<br>

<br>

<br>

I am out of the office until 22/12/2013.<br>

<br>

<br>

<br>

<br>

Note: This is an automated response to your message  "OpenStack-operators<br>

Digest, Vol 38, Issue 13" sent on 18/12/2013 14:00:02.<br>

<br>

This is the only notification you will receive while this person is away.<br>

<br>

<br>

<br>

<br>

------------------------------<br>

<br>

Message: 2<br>

Date: Wed, 18 Dec 2013 20:55:44 -0500<br>

From: Russell Bryant <<a href="mailto:rbryant@redhat.com">rbryant@redhat.com</a>><br>

To: <a href="mailto:openstack-operators@lists.openstack.org">openstack-operators@lists.openstack.org</a><br>

Subject: Re: [Openstack-operators] IGNORED qpid_hosts in<br>

        glance-api.conf<br>

Message-ID: <<a href="mailto:52B25220.9060006@redhat.com">52B25220.9060006@redhat.com</a>><br>

Content-Type: text/plain; charset=ISO-8859-1<br>

<br>

On 12/16/2013 08:47 AM, Alvise Dorigo wrote:<br>

> Hi,<br>

> in this<br>

> page <a href="http://openstack.redhat.com/Highly_Available_Qpid_for_OpenStack#Clustered_without_pacemaker" target="_blank">http://openstack.redhat.com/Highly_Available_Qpid_for_OpenStack#Clustered_without_pacemaker</a><br>


> they explain how to make a glance-api service to connect to a redundant<br>

> qpid broker server on two or more hosts. The parameter to use is<br>

> qpid_hosts, but it seems to be ignored.<br>

><br>

> In fact, if I switch qpid off on the first host listed in the<br>

> qpid_hosts, the glance-api will continue to try to connect to it<br>

> receiving "connection refused"... and never tries to connect to the<br>

> other one, in which qpid is still alive.<br>

><br>

> Any hint ?<br>

><br>

> In the following the relevant part of glance-api.conf<br>

> ==========================<br>

> notifier_strategy=qpid<br>

><br>

> # Configuration options if sending notifications via Qpid (these are<br>

> # the defaults)<br>

> qpid_notification_exchange = glance<br>

> qpid_notification_topic = notifications<br>

> qpid_hosts=<a href="http://192.168.122.41:5672" target="_blank">192.168.122.41:5672</a>,<a href="http://192.168.122.42:5672" target="_blank">192.168.122.42:5672</a><br>

> #qpid_hostname = localhost<br>

> qpid_username =<br>

> qpid_password =<br>

> qpid_sasl_mechanisms =<br>

> qpid_reconnect_timeout = 0<br>

> qpid_reconnect_limit = 0<br>

> qpid_reconnect_interval_min = 0<br>

> qpid_reconnect_interval_max = 0<br>

> qpid_reconnect_interval = 0<br>

> qpid_heartbeat=2<br>

> # Set to 'ssl' to enable SSL<br>

> qpid_protocol = tcp<br>

> qpid_tcp_nodelay = True<br>

> ==========================<br>

<br>

I suspect qpid_hosts isn't suppoted by glance.  The qpid and rabbit code<br>

(until very recently, icehouse) is separate code for glance<br>

notifications from the code used by every other service for notifications.<br>

<br>

--<br>

Russell Bryant<br>

<br>

<br>

<br>

------------------------------<br>

<br>

Message: 3<br>

Date: Wed, 18 Dec 2013 19:33:35 -0700<br>

From: Joe Topjian <<a href="mailto:joe@topjian.net">joe@topjian.net</a>><br>

To: <a href="mailto:openstack-operators@lists.openstack.org">openstack-operators@lists.openstack.org</a><br>

Subject: [Openstack-operators] Neutron crashed hard<br>

Message-ID:<br>

        <<a href="mailto:CA%2By7hviSZVVaNk47dWVqe1boLm5vSOq5D8qsdjyjEjwT7VAVwg@mail.gmail.com">CA+y7hviSZVVaNk47dWVqe1boLm5vSOq5D8qsdjyjEjwT7VAVwg@mail.gmail.com</a>><br>

Content-Type: text/plain; charset="iso-8859-1"<br>

<br>

Hello,<br>

<br>

I set up an internal OpenStack cloud to give a workshop for around 15<br>

people. I decided to use Neutron as I'm trying to get more experience with<br>

it. The cloud consisted of a cloud controller and four compute nodes. Very<br>

decent Dell hardware, Ubuntu 12.04, Havana 2013.2.0.<br>

<br>

Neutron was configured with the OVS plugin, non-overlapping IPs, and a<br>

single shared subnet. GRE tunnelling was used between compute nodes.<br>

<br>

Everything was working fine until the 15 people tried launching a CirrOS<br>

instance at approximately the same time.<br>

<br>

Then Neutron crashed.<br>

<br>

The compute nodes had this in their logs:<br>

<br>

2013-12-18 09:52:57.707 28514 TRACE nova.compute.manager ConnectionFailed:<br>

Connection to neutron failed: timed out<br>

<br>

All instances went into an Error state.<br>

<br>

Restarting the Neutron services did no good. Terminating the Error'd<br>

instances seemed to make the problem worse -- the entire cloud became<br>

unavailable (meaning, both Horizon and Nova were unusable as they would<br>

time out waiting for Neutron).<br>

<br>

We moved on to a different cloud to continue on with the workshop. I would<br>

occasionally issue "neutron net-list" in the original cloud to see if I<br>

would get a result. It took about an hour.<br>

<br>

What happened?<br>

<br>

I've read about Neutron performance issues -- would this be something along<br>

those lines?<br>

<br>

What's the best way to quickly recover from a situation like this?<br>

<br>

Since then, I haven't recreated the database, networks, or anything like<br>

that. Is there a specific log or database table I can look for to see more<br>

information on how exactly this situation happened?<br>

<br>

Thanks,<br>

Joe<br>

-------------- next part --------------<br>

An HTML attachment was scrubbed...<br>

URL: <<a href="http://lists.openstack.org/pipermail/openstack-operators/attachments/20131218/cf2f561c/attachment-0001.html" target="_blank">http://lists.openstack.org/pipermail/openstack-operators/attachments/20131218/cf2f561c/attachment-0001.html</a>><br>


<br>

------------------------------<br>

<br>

Message: 4<br>

Date: Wed, 18 Dec 2013 23:37:57 -0500<br>

From: Erik McCormick <<a href="mailto:emccormick@cirrusseven.com">emccormick@cirrusseven.com</a>><br>

To: Joe Topjian <<a href="mailto:joe@topjian.net">joe@topjian.net</a>><br>

Cc: <a href="mailto:openstack-operators@lists.openstack.org">openstack-operators@lists.openstack.org</a><br>

Subject: Re: [Openstack-operators] Neutron crashed hard<br>

Message-ID:<br>

        <<a href="mailto:CAHUi5cOjdP7B2wdqkbufGKVvejpH53UY4qG6zBs9L66W9M1PJQ@mail.gmail.com">CAHUi5cOjdP7B2wdqkbufGKVvejpH53UY4qG6zBs9L66W9M1PJQ@mail.gmail.com</a>><br>

Content-Type: text/plain; charset="iso-8859-1"<br>

<br>

It sounds more to me like your database went awol than a neutron problem.<br>

Assuming you had done a bit of mucking around testing the cluster before<br>

this event, is there any chance you're not using memcached and your tokens<br>

table has grown large? You might want to switch over to memcached for<br>

Keystone and see if that doesn't make it happier.<br>

On Dec 18, 2013 9:40 PM, "Joe Topjian" <<a href="mailto:joe@topjian.net">joe@topjian.net</a>> wrote:<br>

<br>

> Hello,<br>

><br>

> I set up an internal OpenStack cloud to give a workshop for around 15<br>

> people. I decided to use Neutron as I'm trying to get more experience with<br>

> it. The cloud consisted of a cloud controller and four compute nodes. Very<br>

> decent Dell hardware, Ubuntu 12.04, Havana 2013.2.0.<br>

><br>

> Neutron was configured with the OVS plugin, non-overlapping IPs, and a<br>

> single shared subnet. GRE tunnelling was used between compute nodes.<br>

><br>

> Everything was working fine until the 15 people tried launching a CirrOS<br>

> instance at approximately the same time.<br>

><br>

> Then Neutron crashed.<br>

><br>

> The compute nodes had this in their logs:<br>

><br>

> 2013-12-18 09:52:57.707 28514 TRACE nova.compute.manager ConnectionFailed:<br>

> Connection to neutron failed: timed out<br>

><br>

> All instances went into an Error state.<br>

><br>

> Restarting the Neutron services did no good. Terminating the Error'd<br>

> instances seemed to make the problem worse -- the entire cloud became<br>

> unavailable (meaning, both Horizon and Nova were unusable as they would<br>

> time out waiting for Neutron).<br>

><br>

> We moved on to a different cloud to continue on with the workshop. I would<br>

> occasionally issue "neutron net-list" in the original cloud to see if I<br>

> would get a result. It took about an hour.<br>

><br>

> What happened?<br>

><br>

> I've read about Neutron performance issues -- would this be something<br>

> along those lines?<br>

><br>

> What's the best way to quickly recover from a situation like this?<br>

><br>

> Since then, I haven't recreated the database, networks, or anything like<br>

> that. Is there a specific log or database table I can look for to see more<br>

> information on how exactly this situation happened?<br>

><br>

> Thanks,<br>

> Joe<br>

><br>

> _______________________________________________<br>

> OpenStack-operators mailing list<br>

> <a href="mailto:OpenStack-operators@lists.openstack.org">OpenStack-operators@lists.openstack.org</a><br>

> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators</a><br>

><br>

><br>

-------------- next part --------------<br>

An HTML attachment was scrubbed...<br>

URL: <<a href="http://lists.openstack.org/pipermail/openstack-operators/attachments/20131218/23b438d5/attachment-0001.html" target="_blank">http://lists.openstack.org/pipermail/openstack-operators/attachments/20131218/23b438d5/attachment-0001.html</a>><br>


<br>

------------------------------<br>

<br>

Message: 5<br>

Date: Thu, 19 Dec 2013 00:05:00 -0500<br>

From: Jay Pipes <<a href="mailto:jaypipes@gmail.com">jaypipes@gmail.com</a>><br>

To: <a href="mailto:openstack-operators@lists.openstack.org">openstack-operators@lists.openstack.org</a><br>

Subject: Re: [Openstack-operators] Neutron crashed hard<br>

Message-ID: <<a href="mailto:52B27E7C.7070008@gmail.com">52B27E7C.7070008@gmail.com</a>><br>

Content-Type: text/plain; charset=ISO-8859-1; format=flowed<br>

<br>

On 12/18/2013 09:33 PM, Joe Topjian wrote:<br>

> Hello,<br>

><br>

> I set up an internal OpenStack cloud to give a workshop for around 15<br>

> people. I decided to use Neutron as I'm trying to get more experience<br>

> with it. The cloud consisted of a cloud controller and four compute<br>

> nodes. Very decent Dell hardware, Ubuntu 12.04, Havana 2013.2.0.<br>

><br>

> Neutron was configured with the OVS plugin, non-overlapping IPs, and a<br>

> single shared subnet. GRE tunnelling was used between compute nodes.<br>

<br>

What version of OVS did you deploy? There's a bad bug/behavior in OVS<br>

1.04 that can result in circular routes in the GRE mesh, which we saw<br>

entirely take down an entire deployment zone with tenant traffic<br>

swamping the bonded NIC that was housing the GRE overlay network.<br>

Upgrading to OVS 1.10 and then 1.11 solved that issue along with some<br>

scripting...<br>

<br>

> Everything was working fine until the 15 people tried launching a CirrOS<br>

> instance at approximately the same time.<br>

><br>

> Then Neutron crashed.<br>

><br>

> The compute nodes had this in their logs:<br>

><br>

> 2013-12-18 09:52:57.707 28514 TRACE nova.compute.manager<br>

> ConnectionFailed: Connection to neutron failed: timed out<br>

><br>

> All instances went into an Error state.<br>

><br>

> Restarting the Neutron services did no good. Terminating the Error'd<br>

> instances seemed to make the problem worse -- the entire cloud became<br>

> unavailable (meaning, both Horizon and Nova were unusable as they would<br>

> time out waiting for Neutron).<br>

><br>

> We moved on to a different cloud to continue on with the workshop. I<br>

> would occasionally issue "neutron net-list" in the original cloud to see<br>

> if I would get a result. It took about an hour.<br>

><br>

> What happened?<br>

><br>

> I've read about Neutron performance issues -- would this be something<br>

> along those lines?<br>

<br>

Tough to tell. It very well could be, or it could be OVS itself.<br>

<br>

Look in the Neutron L3 agent, neutron-plugin-openvswitch-agent log (both<br>

on the L3 router node and the compute workers) and neutron-server logs<br>

for errors. It may be some contention issues on the database or MQ end<br>

of things.<br>

<br>

Are you using a multi-plexed neutron server (workers config option > 1)?<br>

<br>

> What's the best way to quickly recover from a situation like this?<br>

<br>

There isn't one. Search through all your logs for Neutron and<br>

openvswitchd looking for issues.<br>

<br>

> Since then, I haven't recreated the database, networks, or anything like<br>

> that. Is there a specific log or database table I can look for to see<br>

> more information on how exactly this situation happened?<br>

<br>

You could look at your database slow log (if using MySQL), but I doubt<br>

you'll find anything in there... but you may get lucky.<br>

<br>

Let us know what you find.<br>

<br>

Best,<br>

-jay<br>

<br>

<br>

<br>

<br>

------------------------------<br>

<br>

Message: 6<br>

Date: Wed, 18 Dec 2013 22:28:52 -0700<br>

From: Joe Topjian <<a href="mailto:joe@topjian.net">joe@topjian.net</a>><br>

To: Erik McCormick <<a href="mailto:emccormick@cirrusseven.com">emccormick@cirrusseven.com</a>><br>

Cc: <a href="mailto:openstack-operators@lists.openstack.org">openstack-operators@lists.openstack.org</a><br>

Subject: Re: [Openstack-operators] Neutron crashed hard<br>

Message-ID:<br>

        <CA+y7hvhF27W_2vjHzYcZi=<a href="mailto:A0sNTyAVuG7%2BDUwXFdW3RbQ%2BhUqg@mail.gmail.com">A0sNTyAVuG7+DUwXFdW3RbQ+hUqg@mail.gmail.com</a>><br>

Content-Type: text/plain; charset="iso-8859-1"<br>

<br>

Thanks for the input. I'm using memcache as a token store already, though.<br>

<br>

<br>

On Wed, Dec 18, 2013 at 9:37 PM, Erik McCormick<br>

<<a href="mailto:emccormick@cirrusseven.com">emccormick@cirrusseven.com</a>>wrote:<br>

<br>

> It sounds more to me like your database went awol than a neutron problem.<br>

> Assuming you had done a bit of mucking around testing the cluster before<br>

> this event, is there any chance you're not using memcached and your tokens<br>

> table has grown large? You might want to switch over to memcached for<br>

> Keystone and see if that doesn't make it happier.<br>

> On Dec 18, 2013 9:40 PM, "Joe Topjian" <<a href="mailto:joe@topjian.net">joe@topjian.net</a>> wrote:<br>

><br>

>> Hello,<br>

>><br>

>> I set up an internal OpenStack cloud to give a workshop for around 15<br>

>> people. I decided to use Neutron as I'm trying to get more experience with<br>

>> it. The cloud consisted of a cloud controller and four compute nodes. Very<br>

>> decent Dell hardware, Ubuntu 12.04, Havana 2013.2.0.<br>

>><br>

>> Neutron was configured with the OVS plugin, non-overlapping IPs, and a<br>

>> single shared subnet. GRE tunnelling was used between compute nodes.<br>

>><br>

>> Everything was working fine until the 15 people tried launching a CirrOS<br>

>> instance at approximately the same time.<br>

>><br>

>> Then Neutron crashed.<br>

>><br>

>> The compute nodes had this in their logs:<br>

>><br>

>> 2013-12-18 09:52:57.707 28514 TRACE nova.compute.manager<br>

>> ConnectionFailed: Connection to neutron failed: timed out<br>

>><br>

>> All instances went into an Error state.<br>

>><br>

>> Restarting the Neutron services did no good. Terminating the Error'd<br>

>> instances seemed to make the problem worse -- the entire cloud became<br>

>> unavailable (meaning, both Horizon and Nova were unusable as they would<br>

>> time out waiting for Neutron).<br>

>><br>

>> We moved on to a different cloud to continue on with the workshop. I<br>

>> would occasionally issue "neutron net-list" in the original cloud to see if<br>

>> I would get a result. It took about an hour.<br>

>><br>

>> What happened?<br>

>><br>

>> I've read about Neutron performance issues -- would this be something<br>

>> along those lines?<br>

>><br>

>> What's the best way to quickly recover from a situation like this?<br>

>><br>

>> Since then, I haven't recreated the database, networks, or anything like<br>

>> that. Is there a specific log or database table I can look for to see more<br>

>> information on how exactly this situation happened?<br>

>><br>

>> Thanks,<br>

>> Joe<br>

>><br>

>> _______________________________________________<br>

>> OpenStack-operators mailing list<br>

>> <a href="mailto:OpenStack-operators@lists.openstack.org">OpenStack-operators@lists.openstack.org</a><br>

>> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators</a><br>

>><br>

>><br>

-------------- next part --------------<br>

An HTML attachment was scrubbed...<br>

URL: <<a href="http://lists.openstack.org/pipermail/openstack-operators/attachments/20131218/a0ebbd98/attachment-0001.html" target="_blank">http://lists.openstack.org/pipermail/openstack-operators/attachments/20131218/a0ebbd98/attachment-0001.html</a>><br>


<br>

------------------------------<br>

<br>

Message: 7<br>

Date: Wed, 18 Dec 2013 22:36:50 -0700<br>

From: Joe Topjian <<a href="mailto:joe@topjian.net">joe@topjian.net</a>><br>

To: Jay Pipes <<a href="mailto:jaypipes@gmail.com">jaypipes@gmail.com</a>><br>

Cc: <a href="mailto:openstack-operators@lists.openstack.org">openstack-operators@lists.openstack.org</a><br>

Subject: Re: [Openstack-operators] Neutron crashed hard<br>

Message-ID:<br>

        <<a href="mailto:CA%2By7hviOYgRc6phjrzW0zTQZcjm_Rr018SFSEqUZ3%2Bgf_ZKB9g@mail.gmail.com">CA+y7hviOYgRc6phjrzW0zTQZcjm_Rr018SFSEqUZ3+gf_ZKB9g@mail.gmail.com</a>><br>

Content-Type: text/plain; charset="iso-8859-1"<br>

<br>

Hi Jay,<br>

<br>

What version of OVS did you deploy? There's a bad bug/behavior in OVS 1.04<br>

> that can result in circular routes in the GRE mesh, which we saw entirely<br>

> take down an entire deployment zone with tenant traffic swamping the bonded<br>

> NIC that was housing the GRE overlay network. Upgrading to OVS 1.10 and<br>

> then 1.11 solved that issue along with some scripting...<br>

><br>

<br>

OVS 1.10 is available in the Ubuntu havana repo. I'm using the Raring 3.8<br>

kernel available in the standard 12.04 repo. The standard 12.04 repo also<br>

includes OVS 1.9 kernel module to compile with the 3.8 kernel.<br>

<br>

So, long story short: OVS 1.10 with 1.9 kernel module.<br>

<br>

<br>

> Are you using a multi-plexed neutron server (workers config option > 1)?<br>

><br>

<br>

I haven't explicitly set this option as I didn't know it existed. Do you<br>

have a reference for this option? I did a quick scan/grep of the neutron<br>

config files and didn't see a reference to workers.<br>

<br>

Thanks,<br>

Joe<br>

-------------- next part --------------<br>

An HTML attachment was scrubbed...<br>

URL: <<a href="http://lists.openstack.org/pipermail/openstack-operators/attachments/20131218/ebc681e4/attachment-0001.html" target="_blank">http://lists.openstack.org/pipermail/openstack-operators/attachments/20131218/ebc681e4/attachment-0001.html</a>><br>


<br>

------------------------------<br>

<br>

Message: 8<br>

Date: Thu, 19 Dec 2013 00:48:51 -0500<br>

From: Jay Pipes <<a href="mailto:jaypipes@gmail.com">jaypipes@gmail.com</a>><br>

To: Joe Topjian <<a href="mailto:joe@topjian.net">joe@topjian.net</a>><br>

Cc: <a href="mailto:openstack-operators@lists.openstack.org">openstack-operators@lists.openstack.org</a><br>

Subject: Re: [Openstack-operators] Neutron crashed hard<br>

Message-ID: <<a href="mailto:52B288C3.7040408@gmail.com">52B288C3.7040408@gmail.com</a>><br>

Content-Type: text/plain; charset=ISO-8859-1; format=flowed<br>

<br>

On 12/19/2013 12:36 AM, Joe Topjian wrote:<br>

> Hi Jay,<br>

><br>

>     What version of OVS did you deploy? There's a bad bug/behavior in<br>

>     OVS 1.04 that can result in circular routes in the GRE mesh, which<br>

>     we saw entirely take down an entire deployment zone with tenant<br>

>     traffic swamping the bonded NIC that was housing the GRE overlay<br>

>     network. Upgrading to OVS 1.10 and then 1.11 solved that issue along<br>

>     with some scripting...<br>

><br>

><br>

> OVS 1.10 is available in the Ubuntu havana repo. I'm using the Raring<br>

> 3.8 kernel available in the standard 12.04 repo. The standard 12.04 repo<br>

> also includes OVS 1.9 kernel module to compile with the 3.8 kernel.<br>

><br>

> So, long story short: OVS 1.10 with 1.9 kernel module.<br>

<br>

OK, then I think you are clear of the circular routing issue we saw.<br>

<br>

>     Are you using a multi-plexed neutron server (workers config option > 1)?<br>

><br>

><br>

> I haven't explicitly set this option as I didn't know it existed. Do you<br>

> have a reference for this option? I did a quick scan/grep of the neutron<br>

> config files and didn't see a reference to workers.<br>

<br>

Sure. So this patch added in the multiplexed Neutron server functionality:<br>

<br>

<a href="https://review.openstack.org/#/c/37131/" target="_blank">https://review.openstack.org/#/c/37131/</a><br>

<br>

It's not in Havana, but we backport that sucker into our Grizzly and<br>

Havana deployments:<br>

<br>

<a href="https://gist.github.com/alanmeadows/7770570" target="_blank">https://gist.github.com/alanmeadows/7770570</a><br>

<a href="https://review.openstack.org/#/c/63020/" target="_blank">https://review.openstack.org/#/c/63020/</a><br>

<br>

I would highly suggest you do the same, set workers=10 or so and retry<br>

your concurrent launch scenario...<br>

<br>

Best,<br>

-jay<br>

<br>

<br>

<br>

------------------------------<br>

<br>

Message: 9<br>

Date: Wed, 18 Dec 2013 22:57:27 -0700<br>

From: Joe Topjian <<a href="mailto:joe@topjian.net">joe@topjian.net</a>><br>

To: Jay Pipes <<a href="mailto:jaypipes@gmail.com">jaypipes@gmail.com</a>><br>

Cc: <a href="mailto:openstack-operators@lists.openstack.org">openstack-operators@lists.openstack.org</a><br>

Subject: Re: [Openstack-operators] Neutron crashed hard<br>

Message-ID:<br>

        <<a href="mailto:CA%2By7hviSK88JwJjGdD0v3h0Bc-DY%2BADxOhMAf5T8Ut5SeYkzKQ@mail.gmail.com">CA+y7hviSK88JwJjGdD0v3h0Bc-DY+ADxOhMAf5T8Ut5SeYkzKQ@mail.gmail.com</a>><br>

Content-Type: text/plain; charset="iso-8859-1"<br>

<br>

><br>

> Sure. So this patch added in the multiplexed Neutron server functionality:<br>

><br>

> <a href="https://review.openstack.org/#/c/37131/" target="_blank">https://review.openstack.org/#/c/37131/</a><br>

><br>

> It's not in Havana, but we backport that sucker into our Grizzly and<br>

> Havana deployments:<br>

><br>

> <a href="https://gist.github.com/alanmeadows/7770570" target="_blank">https://gist.github.com/alanmeadows/7770570</a><br>

> <a href="https://review.openstack.org/#/c/63020/" target="_blank">https://review.openstack.org/#/c/63020/</a><br>

><br>

> I would highly suggest you do the same, set workers=10 or so and retry<br>

> your concurrent launch scenario...<br>

><br>

<br>

Oh, very cool! This definitely feels like the right direction. I won't have<br>

time to try this for a few days, but I will definitely patch my environment<br>

and try to simulate today's event. I'll follow-up with the results.<br>

-------------- next part --------------<br>

An HTML attachment was scrubbed...<br>

URL: <<a href="http://lists.openstack.org/pipermail/openstack-operators/attachments/20131218/574e797c/attachment-0001.html" target="_blank">http://lists.openstack.org/pipermail/openstack-operators/attachments/20131218/574e797c/attachment-0001.html</a>><br>


<br>

------------------------------<br>

<br>

_______________________________________________<br>

OpenStack-operators mailing list<br>

<a href="mailto:OpenStack-operators@lists.openstack.org">OpenStack-operators@lists.openstack.org</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators</a><br>

<br>

<br>

End of OpenStack-operators Digest, Vol 38, Issue 14<br>

***************************************************<br>

</blockquote></div><br></div>