[Openstack-operators] [Essex] compute node hard reboot, can't create domain.

Narayan Desai narayan.desai at gmail.com
Mon Jul 8 03:12:55 UTC 2013


You'll also need to setup the tagged interface (eth2 at 14) and add it to the
br14 bridge.
 -nld


On Sun, Jul 7, 2013 at 9:32 PM, Samuel Winchenbach <swinchen at gmail.com>wrote:

> Lorin,
>
> I am running in vlan mode (not multihost mode).   Restarting nova-compute
> does not seem to create the bridge "br14".   I tried creating manually with
> "brctl addbr br14".  Doing this allowed me to start the VM but I can not
> ping or ssh to the VM on either the internal or external network.  I
> probably need to create the vlan14 at eth0 interface as well and add it to
> the bridge?
>
> Why might nova-compute not recreate the bridges and interfaces?  I don't
> see any warnings or errors in the log files (on either node) when starting
> nova-compute on failed compute node.
>
> ************ FROM THE CONTROLLER NODE ************
> root at cloudy:~# nova-manage service list
> 2013-07-07 22:25:40 DEBUG nova.utils
> [req-f4a55a39-03d8-4bc7-b5c8-f53b1825f934 None None] backend <module
> 'nova.db.sqlalchemy.api' from
> '/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.pyc'> from
> (pid=29059) __get_backend /usr/lib/python2.7/dist-packages/nova/utils.py:658
> Binary           Host                                 Zone
> Status     State Updated_At
> nova-scheduler   cloudy                               nova
> enabled    :-)   2013-07-08 02:25:40
> nova-compute     cloudy                               nova
> enabled    :-)   2013-07-08 02:25:25
> nova-network     cloudy                               nova
> enabled    :-)   2013-07-08 02:25:40
> nova-compute     compute-01                           nova
> enabled    :-)   2013-07-08 02:25:40
> nova-compute     compute-02                           nova
> enabled    XXX   2013-05-21 17:47:13  <--- this is ok, I have it turned off.
>
>
> ************ FROM THE FAILED NODE ************
> root at compute-01:~# service nova-compute restart
> nova-compute stop/waiting
> nova-compute start/running, process 13057
> root at compute-01:~# sleep 10
> root at compute-01:~# ip a
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>     inet 127.0.0.1/8 scope host lo
>     inet6 ::1/128 scope host
>        valid_lft forever preferred_lft forever
> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen
> 1000
>     link/ether 00:25:90:56:d9:d2 brd ff:ff:ff:ff:ff:ff
>     inet 10.54.50.30/16 brd 10.54.255.255 scope global eth0 <-------------------
> Address obtained via DHCP for external route
>     inet 10.20.0.2/16 scope global eth0
> <--------------------------------------- OpenStack Management/Internal
> Network
>     inet6 fe80::225:90ff:fe56:d9d2/64 scope link
>        valid_lft forever preferred_lft forever
> 3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
>     link/ether 00:25:90:56:d9:d3 brd ff:ff:ff:ff:ff:ff
> 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen
> 1000
>     link/ether 00:02:c9:34:e9:90 brd ff:ff:ff:ff:ff:ff
>     inet 10.57.60.2/16 brd 10.57.255.255 scope global eth2 <--------------------
> On the NFS network (for live migration)
>     inet6 fe80::202:c9ff:fe34:e990/64 scope link
>        valid_lft forever preferred_lft forever
> 5: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
>     link/ether 00:02:c9:34:e9:91 brd ff:ff:ff:ff:ff:ff
> 6: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue
> state DOWN
>     link/ether 56:ed:f9:dd:bc:58 brd ff:ff:ff:ff:ff:ff
>     inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
>
>
> On Sun, Jul 7, 2013 at 10:03 PM, Lorin Hochstein <lorin at nimbisservices.com
> > wrote:
>
>> Hi Samuel:
>>
>> It sounds like your VMs are configured to plug into a Linux bridge that
>> doesn't exist on compute-01 anymore. You could create it manually, although
>> I would expect that it would have been created automatically by the
>> relevant nova service when they came back up.
>>
>> You can check if the bridge is there by doing "ip a" and looking for the
>> "br14" network device.
>>
>> Are you running networking in multihost mode? If so, I think restarting
>> the nova-network service on compute-01 should do it. If you aren't running
>> in multihost mode, then it should come back by restarting the nova-compute
>> service on compute-01.
>>
>> Otherwise, you'll need to create the bridge manually, and how you do that
>> will depend on whether you're running flat or vlan. If it was called br14,
>> I'm assuming you're running in vlan mode with vlan tag 14 associated with
>> this project?
>>
>> Lorin
>>
>>
>> On Sun, Jul 7, 2013 at 9:21 PM, Samuel Winchenbach <swinchen at gmail.com>wrote:
>>
>>> Hi All,
>>>
>>> I have an old Essex cluster that we are getting ready to phase out for
>>> grizzly.  Unfortunately over the weekend one of the compute nodes powered
>>> off (power supply failure it looks like).  When I tried a "nova reboot
>>> <UUID>"
>>>
>>> I got:
>>>
>>> 2013-07-07 21:17:34 ERROR nova.rpc.amqp
>>> [req-d2ea5f46-9dc2-4788-9951-07d985a1f8dc 6986639ba3c84ab5b05fdd2e122101f0
>>> 3806a811d2d34542bdfc5d7f31ce7b89] Exception during message handling
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp Traceback (most recent call
>>> last):
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp   File
>>> "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 253, in
>>> _process_data
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp     rval =
>>> node_func(context=ctxt, **node_args)
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp   File
>>> "/usr/lib/python2.7/dist-packages/nova/exception.py", line 114, in wrapped
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp     return f(*args, **kw)
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp   File
>>> "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 159, in
>>> decorated_function
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp     function(self, context,
>>> instance_uuid, *args, **kwargs)
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp   File
>>> "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 183, in
>>> decorated_function
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp     sys.exc_info())
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp   File
>>> "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp     self.gen.next()
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp   File
>>> "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 177, in
>>> decorated_function
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp     return function(self,
>>> context, instance_uuid, *args, **kwargs)
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp   File
>>> "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 904, in
>>> reboot_instance
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp     reboot_type)
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp   File
>>> "/usr/lib/python2.7/dist-packages/nova/exception.py", line 114, in wrapped
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp     return f(*args, **kw)
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp   File
>>> "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/connection.py", line
>>> 721, in reboot
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp     if
>>> self._soft_reboot(instance):
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp   File
>>> "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/connection.py", line
>>> 757, in _soft_reboot
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp     dom.create()
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp   File
>>> "/usr/lib/python2.7/dist-packages/libvirt.py", line 551, in create
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp     if ret == -1: raise
>>> libvirtError ('virDomainCreate() failed', dom=self)
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp libvirtError: Cannot get
>>> interface MTU on 'br14': No such device
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp
>>>
>>>
>>> So I tried starting it manually:
>>>
>>> root at compute-01:/etc/libvirt/qemu# virsh create instance-00000035.xml
>>> error: Failed to create domain from instance-00000035.xml
>>> error: Cannot get interface MTU on 'br14': No such device
>>>
>>>
>>> Any idea what I might be doing wrong?  All the services show :-) with
>>> nova-manage
>>>
>>>
>>> Thanks for your help...
>>>
>>> Sam
>>>
>>> _______________________________________________
>>> OpenStack-operators mailing list
>>> OpenStack-operators at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>
>>>
>>
>>
>> --
>> Lorin Hochstein
>> Lead Architect - Cloud Services
>> Nimbis Services, Inc.
>> www.nimbisservices.com
>>
>
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20130707/add6b665/attachment.html>


More information about the OpenStack-operators mailing list