[Openstack-operators] OpenStack-operators Digest, Vol 38, Issue 14

yungho yungho5054 at gmail.com
Sun Dec 22 11:02:08 UTC 2013


OpenStack with nova-compute manage vCenter, create an instance of the
dashboard interface with vnc open. . Refer to the official documentation
nova.conf added:
compute_driver = vmwareapi.VMwareVCDriver
vmwareapi_host_ip = 200.21.0.99
vmwareapi_host_username = administrator
vmwareapi_host_password = root123.
vmwareapi_cluster_name = openstack_vs
vmwareapi_wsdl_loc =
http://200.21.4.18:8080/vmware/SDK/vsphere-ws/wsdl/vim25/vimService.wsdl

nova.conf the VNC configuration options are as follows:
vncserver_listen=0.0.0.0
vnc_enabled=true
novncproxy_base_url=http://200.21.4.17:6080/vnc_auto.html
novncproxy_port=6080


And my control node, compute nodes, vCenter hosts are 200.21.4.17,
200.21.4.18, 200.21.0.99


On Thu, Dec 19, 2013 at 8:00 PM, <
openstack-operators-request at lists.openstack.org> wrote:

> Send OpenStack-operators mailing list submissions to
>         openstack-operators at lists.openstack.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
> or, via email, send a message with subject or body 'help' to
>         openstack-operators-request at lists.openstack.org
>
> You can reach the person managing the list at
>         openstack-operators-owner at lists.openstack.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of OpenStack-operators digest..."
>
>
> Today's Topics:
>
>    1. AUTO: Ophir Solonikov is out of the office        (returning
>       22/12/2013) (Ophir Solonikov)
>    2. Re: IGNORED qpid_hosts in glance-api.conf (Russell Bryant)
>    3. Neutron crashed hard (Joe Topjian)
>    4. Re: Neutron crashed hard (Erik McCormick)
>    5. Re: Neutron crashed hard (Jay Pipes)
>    6. Re: Neutron crashed hard (Joe Topjian)
>    7. Re: Neutron crashed hard (Joe Topjian)
>    8. Re: Neutron crashed hard (Jay Pipes)
>    9. Re: Neutron crashed hard (Joe Topjian)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 18 Dec 2013 16:07:03 +0200
> From: Ophir Solonikov <OPHIRS at il.ibm.com>
> To: openstack-operators at lists.openstack.org
> Subject: [Openstack-operators] AUTO: Ophir Solonikov is out of the
>         office  (returning 22/12/2013)
> Message-ID:
>         <
> OFA9F78110.BC21C4C6-ONC2257C45.004D8CE2-C2257C45.004D8CE2 at il.ibm.com>
> Content-Type: text/plain; charset=US-ASCII
>
>
> I am out of the office until 22/12/2013.
>
>
>
>
> Note: This is an automated response to your message  "OpenStack-operators
> Digest, Vol 38, Issue 13" sent on 18/12/2013 14:00:02.
>
> This is the only notification you will receive while this person is away.
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Wed, 18 Dec 2013 20:55:44 -0500
> From: Russell Bryant <rbryant at redhat.com>
> To: openstack-operators at lists.openstack.org
> Subject: Re: [Openstack-operators] IGNORED qpid_hosts in
>         glance-api.conf
> Message-ID: <52B25220.9060006 at redhat.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> On 12/16/2013 08:47 AM, Alvise Dorigo wrote:
> > Hi,
> > in this
> > page
> http://openstack.redhat.com/Highly_Available_Qpid_for_OpenStack#Clustered_without_pacemaker
> > they explain how to make a glance-api service to connect to a redundant
> > qpid broker server on two or more hosts. The parameter to use is
> > qpid_hosts, but it seems to be ignored.
> >
> > In fact, if I switch qpid off on the first host listed in the
> > qpid_hosts, the glance-api will continue to try to connect to it
> > receiving "connection refused"... and never tries to connect to the
> > other one, in which qpid is still alive.
> >
> > Any hint ?
> >
> > In the following the relevant part of glance-api.conf
> > ==========================
> > notifier_strategy=qpid
> >
> > # Configuration options if sending notifications via Qpid (these are
> > # the defaults)
> > qpid_notification_exchange = glance
> > qpid_notification_topic = notifications
> > qpid_hosts=192.168.122.41:5672,192.168.122.42:5672
> > #qpid_hostname = localhost
> > qpid_username =
> > qpid_password =
> > qpid_sasl_mechanisms =
> > qpid_reconnect_timeout = 0
> > qpid_reconnect_limit = 0
> > qpid_reconnect_interval_min = 0
> > qpid_reconnect_interval_max = 0
> > qpid_reconnect_interval = 0
> > qpid_heartbeat=2
> > # Set to 'ssl' to enable SSL
> > qpid_protocol = tcp
> > qpid_tcp_nodelay = True
> > ==========================
>
> I suspect qpid_hosts isn't suppoted by glance.  The qpid and rabbit code
> (until very recently, icehouse) is separate code for glance
> notifications from the code used by every other service for notifications.
>
> --
> Russell Bryant
>
>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 18 Dec 2013 19:33:35 -0700
> From: Joe Topjian <joe at topjian.net>
> To: openstack-operators at lists.openstack.org
> Subject: [Openstack-operators] Neutron crashed hard
> Message-ID:
>         <
> CA+y7hviSZVVaNk47dWVqe1boLm5vSOq5D8qsdjyjEjwT7VAVwg at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hello,
>
> I set up an internal OpenStack cloud to give a workshop for around 15
> people. I decided to use Neutron as I'm trying to get more experience with
> it. The cloud consisted of a cloud controller and four compute nodes. Very
> decent Dell hardware, Ubuntu 12.04, Havana 2013.2.0.
>
> Neutron was configured with the OVS plugin, non-overlapping IPs, and a
> single shared subnet. GRE tunnelling was used between compute nodes.
>
> Everything was working fine until the 15 people tried launching a CirrOS
> instance at approximately the same time.
>
> Then Neutron crashed.
>
> The compute nodes had this in their logs:
>
> 2013-12-18 09:52:57.707 28514 TRACE nova.compute.manager ConnectionFailed:
> Connection to neutron failed: timed out
>
> All instances went into an Error state.
>
> Restarting the Neutron services did no good. Terminating the Error'd
> instances seemed to make the problem worse -- the entire cloud became
> unavailable (meaning, both Horizon and Nova were unusable as they would
> time out waiting for Neutron).
>
> We moved on to a different cloud to continue on with the workshop. I would
> occasionally issue "neutron net-list" in the original cloud to see if I
> would get a result. It took about an hour.
>
> What happened?
>
> I've read about Neutron performance issues -- would this be something along
> those lines?
>
> What's the best way to quickly recover from a situation like this?
>
> Since then, I haven't recreated the database, networks, or anything like
> that. Is there a specific log or database table I can look for to see more
> information on how exactly this situation happened?
>
> Thanks,
> Joe
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.openstack.org/pipermail/openstack-operators/attachments/20131218/cf2f561c/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 4
> Date: Wed, 18 Dec 2013 23:37:57 -0500
> From: Erik McCormick <emccormick at cirrusseven.com>
> To: Joe Topjian <joe at topjian.net>
> Cc: openstack-operators at lists.openstack.org
> Subject: Re: [Openstack-operators] Neutron crashed hard
> Message-ID:
>         <
> CAHUi5cOjdP7B2wdqkbufGKVvejpH53UY4qG6zBs9L66W9M1PJQ at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> It sounds more to me like your database went awol than a neutron problem.
> Assuming you had done a bit of mucking around testing the cluster before
> this event, is there any chance you're not using memcached and your tokens
> table has grown large? You might want to switch over to memcached for
> Keystone and see if that doesn't make it happier.
> On Dec 18, 2013 9:40 PM, "Joe Topjian" <joe at topjian.net> wrote:
>
> > Hello,
> >
> > I set up an internal OpenStack cloud to give a workshop for around 15
> > people. I decided to use Neutron as I'm trying to get more experience
> with
> > it. The cloud consisted of a cloud controller and four compute nodes.
> Very
> > decent Dell hardware, Ubuntu 12.04, Havana 2013.2.0.
> >
> > Neutron was configured with the OVS plugin, non-overlapping IPs, and a
> > single shared subnet. GRE tunnelling was used between compute nodes.
> >
> > Everything was working fine until the 15 people tried launching a CirrOS
> > instance at approximately the same time.
> >
> > Then Neutron crashed.
> >
> > The compute nodes had this in their logs:
> >
> > 2013-12-18 09:52:57.707 28514 TRACE nova.compute.manager
> ConnectionFailed:
> > Connection to neutron failed: timed out
> >
> > All instances went into an Error state.
> >
> > Restarting the Neutron services did no good. Terminating the Error'd
> > instances seemed to make the problem worse -- the entire cloud became
> > unavailable (meaning, both Horizon and Nova were unusable as they would
> > time out waiting for Neutron).
> >
> > We moved on to a different cloud to continue on with the workshop. I
> would
> > occasionally issue "neutron net-list" in the original cloud to see if I
> > would get a result. It took about an hour.
> >
> > What happened?
> >
> > I've read about Neutron performance issues -- would this be something
> > along those lines?
> >
> > What's the best way to quickly recover from a situation like this?
> >
> > Since then, I haven't recreated the database, networks, or anything like
> > that. Is there a specific log or database table I can look for to see
> more
> > information on how exactly this situation happened?
> >
> > Thanks,
> > Joe
> >
> > _______________________________________________
> > OpenStack-operators mailing list
> > OpenStack-operators at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> >
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.openstack.org/pipermail/openstack-operators/attachments/20131218/23b438d5/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 5
> Date: Thu, 19 Dec 2013 00:05:00 -0500
> From: Jay Pipes <jaypipes at gmail.com>
> To: openstack-operators at lists.openstack.org
> Subject: Re: [Openstack-operators] Neutron crashed hard
> Message-ID: <52B27E7C.7070008 at gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> On 12/18/2013 09:33 PM, Joe Topjian wrote:
> > Hello,
> >
> > I set up an internal OpenStack cloud to give a workshop for around 15
> > people. I decided to use Neutron as I'm trying to get more experience
> > with it. The cloud consisted of a cloud controller and four compute
> > nodes. Very decent Dell hardware, Ubuntu 12.04, Havana 2013.2.0.
> >
> > Neutron was configured with the OVS plugin, non-overlapping IPs, and a
> > single shared subnet. GRE tunnelling was used between compute nodes.
>
> What version of OVS did you deploy? There's a bad bug/behavior in OVS
> 1.04 that can result in circular routes in the GRE mesh, which we saw
> entirely take down an entire deployment zone with tenant traffic
> swamping the bonded NIC that was housing the GRE overlay network.
> Upgrading to OVS 1.10 and then 1.11 solved that issue along with some
> scripting...
>
> > Everything was working fine until the 15 people tried launching a CirrOS
> > instance at approximately the same time.
> >
> > Then Neutron crashed.
> >
> > The compute nodes had this in their logs:
> >
> > 2013-12-18 09:52:57.707 28514 TRACE nova.compute.manager
> > ConnectionFailed: Connection to neutron failed: timed out
> >
> > All instances went into an Error state.
> >
> > Restarting the Neutron services did no good. Terminating the Error'd
> > instances seemed to make the problem worse -- the entire cloud became
> > unavailable (meaning, both Horizon and Nova were unusable as they would
> > time out waiting for Neutron).
> >
> > We moved on to a different cloud to continue on with the workshop. I
> > would occasionally issue "neutron net-list" in the original cloud to see
> > if I would get a result. It took about an hour.
> >
> > What happened?
> >
> > I've read about Neutron performance issues -- would this be something
> > along those lines?
>
> Tough to tell. It very well could be, or it could be OVS itself.
>
> Look in the Neutron L3 agent, neutron-plugin-openvswitch-agent log (both
> on the L3 router node and the compute workers) and neutron-server logs
> for errors. It may be some contention issues on the database or MQ end
> of things.
>
> Are you using a multi-plexed neutron server (workers config option > 1)?
>
> > What's the best way to quickly recover from a situation like this?
>
> There isn't one. Search through all your logs for Neutron and
> openvswitchd looking for issues.
>
> > Since then, I haven't recreated the database, networks, or anything like
> > that. Is there a specific log or database table I can look for to see
> > more information on how exactly this situation happened?
>
> You could look at your database slow log (if using MySQL), but I doubt
> you'll find anything in there... but you may get lucky.
>
> Let us know what you find.
>
> Best,
> -jay
>
>
>
>
> ------------------------------
>
> Message: 6
> Date: Wed, 18 Dec 2013 22:28:52 -0700
> From: Joe Topjian <joe at topjian.net>
> To: Erik McCormick <emccormick at cirrusseven.com>
> Cc: openstack-operators at lists.openstack.org
> Subject: Re: [Openstack-operators] Neutron crashed hard
> Message-ID:
>         <CA+y7hvhF27W_2vjHzYcZi=
> A0sNTyAVuG7+DUwXFdW3RbQ+hUqg at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Thanks for the input. I'm using memcache as a token store already, though.
>
>
> On Wed, Dec 18, 2013 at 9:37 PM, Erik McCormick
> <emccormick at cirrusseven.com>wrote:
>
> > It sounds more to me like your database went awol than a neutron problem.
> > Assuming you had done a bit of mucking around testing the cluster before
> > this event, is there any chance you're not using memcached and your
> tokens
> > table has grown large? You might want to switch over to memcached for
> > Keystone and see if that doesn't make it happier.
> > On Dec 18, 2013 9:40 PM, "Joe Topjian" <joe at topjian.net> wrote:
> >
> >> Hello,
> >>
> >> I set up an internal OpenStack cloud to give a workshop for around 15
> >> people. I decided to use Neutron as I'm trying to get more experience
> with
> >> it. The cloud consisted of a cloud controller and four compute nodes.
> Very
> >> decent Dell hardware, Ubuntu 12.04, Havana 2013.2.0.
> >>
> >> Neutron was configured with the OVS plugin, non-overlapping IPs, and a
> >> single shared subnet. GRE tunnelling was used between compute nodes.
> >>
> >> Everything was working fine until the 15 people tried launching a CirrOS
> >> instance at approximately the same time.
> >>
> >> Then Neutron crashed.
> >>
> >> The compute nodes had this in their logs:
> >>
> >> 2013-12-18 09:52:57.707 28514 TRACE nova.compute.manager
> >> ConnectionFailed: Connection to neutron failed: timed out
> >>
> >> All instances went into an Error state.
> >>
> >> Restarting the Neutron services did no good. Terminating the Error'd
> >> instances seemed to make the problem worse -- the entire cloud became
> >> unavailable (meaning, both Horizon and Nova were unusable as they would
> >> time out waiting for Neutron).
> >>
> >> We moved on to a different cloud to continue on with the workshop. I
> >> would occasionally issue "neutron net-list" in the original cloud to
> see if
> >> I would get a result. It took about an hour.
> >>
> >> What happened?
> >>
> >> I've read about Neutron performance issues -- would this be something
> >> along those lines?
> >>
> >> What's the best way to quickly recover from a situation like this?
> >>
> >> Since then, I haven't recreated the database, networks, or anything like
> >> that. Is there a specific log or database table I can look for to see
> more
> >> information on how exactly this situation happened?
> >>
> >> Thanks,
> >> Joe
> >>
> >> _______________________________________________
> >> OpenStack-operators mailing list
> >> OpenStack-operators at lists.openstack.org
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> >>
> >>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.openstack.org/pipermail/openstack-operators/attachments/20131218/a0ebbd98/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 7
> Date: Wed, 18 Dec 2013 22:36:50 -0700
> From: Joe Topjian <joe at topjian.net>
> To: Jay Pipes <jaypipes at gmail.com>
> Cc: openstack-operators at lists.openstack.org
> Subject: Re: [Openstack-operators] Neutron crashed hard
> Message-ID:
>         <
> CA+y7hviOYgRc6phjrzW0zTQZcjm_Rr018SFSEqUZ3+gf_ZKB9g at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi Jay,
>
> What version of OVS did you deploy? There's a bad bug/behavior in OVS 1.04
> > that can result in circular routes in the GRE mesh, which we saw entirely
> > take down an entire deployment zone with tenant traffic swamping the
> bonded
> > NIC that was housing the GRE overlay network. Upgrading to OVS 1.10 and
> > then 1.11 solved that issue along with some scripting...
> >
>
> OVS 1.10 is available in the Ubuntu havana repo. I'm using the Raring 3.8
> kernel available in the standard 12.04 repo. The standard 12.04 repo also
> includes OVS 1.9 kernel module to compile with the 3.8 kernel.
>
> So, long story short: OVS 1.10 with 1.9 kernel module.
>
>
> > Are you using a multi-plexed neutron server (workers config option > 1)?
> >
>
> I haven't explicitly set this option as I didn't know it existed. Do you
> have a reference for this option? I did a quick scan/grep of the neutron
> config files and didn't see a reference to workers.
>
> Thanks,
> Joe
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.openstack.org/pipermail/openstack-operators/attachments/20131218/ebc681e4/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 8
> Date: Thu, 19 Dec 2013 00:48:51 -0500
> From: Jay Pipes <jaypipes at gmail.com>
> To: Joe Topjian <joe at topjian.net>
> Cc: openstack-operators at lists.openstack.org
> Subject: Re: [Openstack-operators] Neutron crashed hard
> Message-ID: <52B288C3.7040408 at gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> On 12/19/2013 12:36 AM, Joe Topjian wrote:
> > Hi Jay,
> >
> >     What version of OVS did you deploy? There's a bad bug/behavior in
> >     OVS 1.04 that can result in circular routes in the GRE mesh, which
> >     we saw entirely take down an entire deployment zone with tenant
> >     traffic swamping the bonded NIC that was housing the GRE overlay
> >     network. Upgrading to OVS 1.10 and then 1.11 solved that issue along
> >     with some scripting...
> >
> >
> > OVS 1.10 is available in the Ubuntu havana repo. I'm using the Raring
> > 3.8 kernel available in the standard 12.04 repo. The standard 12.04 repo
> > also includes OVS 1.9 kernel module to compile with the 3.8 kernel.
> >
> > So, long story short: OVS 1.10 with 1.9 kernel module.
>
> OK, then I think you are clear of the circular routing issue we saw.
>
> >     Are you using a multi-plexed neutron server (workers config option >
> 1)?
> >
> >
> > I haven't explicitly set this option as I didn't know it existed. Do you
> > have a reference for this option? I did a quick scan/grep of the neutron
> > config files and didn't see a reference to workers.
>
> Sure. So this patch added in the multiplexed Neutron server functionality:
>
> https://review.openstack.org/#/c/37131/
>
> It's not in Havana, but we backport that sucker into our Grizzly and
> Havana deployments:
>
> https://gist.github.com/alanmeadows/7770570
> https://review.openstack.org/#/c/63020/
>
> I would highly suggest you do the same, set workers=10 or so and retry
> your concurrent launch scenario...
>
> Best,
> -jay
>
>
>
> ------------------------------
>
> Message: 9
> Date: Wed, 18 Dec 2013 22:57:27 -0700
> From: Joe Topjian <joe at topjian.net>
> To: Jay Pipes <jaypipes at gmail.com>
> Cc: openstack-operators at lists.openstack.org
> Subject: Re: [Openstack-operators] Neutron crashed hard
> Message-ID:
>         <
> CA+y7hviSK88JwJjGdD0v3h0Bc-DY+ADxOhMAf5T8Ut5SeYkzKQ at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> >
> > Sure. So this patch added in the multiplexed Neutron server
> functionality:
> >
> > https://review.openstack.org/#/c/37131/
> >
> > It's not in Havana, but we backport that sucker into our Grizzly and
> > Havana deployments:
> >
> > https://gist.github.com/alanmeadows/7770570
> > https://review.openstack.org/#/c/63020/
> >
> > I would highly suggest you do the same, set workers=10 or so and retry
> > your concurrent launch scenario...
> >
>
> Oh, very cool! This definitely feels like the right direction. I won't have
> time to try this for a few days, but I will definitely patch my environment
> and try to simulate today's event. I'll follow-up with the results.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.openstack.org/pipermail/openstack-operators/attachments/20131218/574e797c/attachment-0001.html
> >
>
> ------------------------------
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
> End of OpenStack-operators Digest, Vol 38, Issue 14
> ***************************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20131222/527957dd/attachment.html>


More information about the OpenStack-operators mailing list