[Openstack-operators] [Essex] compute node hard reboot, can't create domain.
Narayan Desai
narayan.desai at gmail.com
Mon Jul 8 03:12:55 UTC 2013
You'll also need to setup the tagged interface (eth2 at 14) and add it to the
br14 bridge.
-nld
On Sun, Jul 7, 2013 at 9:32 PM, Samuel Winchenbach <swinchen at gmail.com>wrote:
> Lorin,
>
> I am running in vlan mode (not multihost mode). Restarting nova-compute
> does not seem to create the bridge "br14". I tried creating manually with
> "brctl addbr br14". Doing this allowed me to start the VM but I can not
> ping or ssh to the VM on either the internal or external network. I
> probably need to create the vlan14 at eth0 interface as well and add it to
> the bridge?
>
> Why might nova-compute not recreate the bridges and interfaces? I don't
> see any warnings or errors in the log files (on either node) when starting
> nova-compute on failed compute node.
>
> ************ FROM THE CONTROLLER NODE ************
> root at cloudy:~# nova-manage service list
> 2013-07-07 22:25:40 DEBUG nova.utils
> [req-f4a55a39-03d8-4bc7-b5c8-f53b1825f934 None None] backend <module
> 'nova.db.sqlalchemy.api' from
> '/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.pyc'> from
> (pid=29059) __get_backend /usr/lib/python2.7/dist-packages/nova/utils.py:658
> Binary Host Zone
> Status State Updated_At
> nova-scheduler cloudy nova
> enabled :-) 2013-07-08 02:25:40
> nova-compute cloudy nova
> enabled :-) 2013-07-08 02:25:25
> nova-network cloudy nova
> enabled :-) 2013-07-08 02:25:40
> nova-compute compute-01 nova
> enabled :-) 2013-07-08 02:25:40
> nova-compute compute-02 nova
> enabled XXX 2013-05-21 17:47:13 <--- this is ok, I have it turned off.
>
>
> ************ FROM THE FAILED NODE ************
> root at compute-01:~# service nova-compute restart
> nova-compute stop/waiting
> nova-compute start/running, process 13057
> root at compute-01:~# sleep 10
> root at compute-01:~# ip a
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> inet 127.0.0.1/8 scope host lo
> inet6 ::1/128 scope host
> valid_lft forever preferred_lft forever
> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen
> 1000
> link/ether 00:25:90:56:d9:d2 brd ff:ff:ff:ff:ff:ff
> inet 10.54.50.30/16 brd 10.54.255.255 scope global eth0 <-------------------
> Address obtained via DHCP for external route
> inet 10.20.0.2/16 scope global eth0
> <--------------------------------------- OpenStack Management/Internal
> Network
> inet6 fe80::225:90ff:fe56:d9d2/64 scope link
> valid_lft forever preferred_lft forever
> 3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
> link/ether 00:25:90:56:d9:d3 brd ff:ff:ff:ff:ff:ff
> 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen
> 1000
> link/ether 00:02:c9:34:e9:90 brd ff:ff:ff:ff:ff:ff
> inet 10.57.60.2/16 brd 10.57.255.255 scope global eth2 <--------------------
> On the NFS network (for live migration)
> inet6 fe80::202:c9ff:fe34:e990/64 scope link
> valid_lft forever preferred_lft forever
> 5: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
> link/ether 00:02:c9:34:e9:91 brd ff:ff:ff:ff:ff:ff
> 6: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue
> state DOWN
> link/ether 56:ed:f9:dd:bc:58 brd ff:ff:ff:ff:ff:ff
> inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
>
>
> On Sun, Jul 7, 2013 at 10:03 PM, Lorin Hochstein <lorin at nimbisservices.com
> > wrote:
>
>> Hi Samuel:
>>
>> It sounds like your VMs are configured to plug into a Linux bridge that
>> doesn't exist on compute-01 anymore. You could create it manually, although
>> I would expect that it would have been created automatically by the
>> relevant nova service when they came back up.
>>
>> You can check if the bridge is there by doing "ip a" and looking for the
>> "br14" network device.
>>
>> Are you running networking in multihost mode? If so, I think restarting
>> the nova-network service on compute-01 should do it. If you aren't running
>> in multihost mode, then it should come back by restarting the nova-compute
>> service on compute-01.
>>
>> Otherwise, you'll need to create the bridge manually, and how you do that
>> will depend on whether you're running flat or vlan. If it was called br14,
>> I'm assuming you're running in vlan mode with vlan tag 14 associated with
>> this project?
>>
>> Lorin
>>
>>
>> On Sun, Jul 7, 2013 at 9:21 PM, Samuel Winchenbach <swinchen at gmail.com>wrote:
>>
>>> Hi All,
>>>
>>> I have an old Essex cluster that we are getting ready to phase out for
>>> grizzly. Unfortunately over the weekend one of the compute nodes powered
>>> off (power supply failure it looks like). When I tried a "nova reboot
>>> <UUID>"
>>>
>>> I got:
>>>
>>> 2013-07-07 21:17:34 ERROR nova.rpc.amqp
>>> [req-d2ea5f46-9dc2-4788-9951-07d985a1f8dc 6986639ba3c84ab5b05fdd2e122101f0
>>> 3806a811d2d34542bdfc5d7f31ce7b89] Exception during message handling
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp Traceback (most recent call
>>> last):
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp File
>>> "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 253, in
>>> _process_data
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp rval =
>>> node_func(context=ctxt, **node_args)
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp File
>>> "/usr/lib/python2.7/dist-packages/nova/exception.py", line 114, in wrapped
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp return f(*args, **kw)
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp File
>>> "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 159, in
>>> decorated_function
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp function(self, context,
>>> instance_uuid, *args, **kwargs)
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp File
>>> "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 183, in
>>> decorated_function
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp sys.exc_info())
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp File
>>> "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp self.gen.next()
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp File
>>> "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 177, in
>>> decorated_function
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp return function(self,
>>> context, instance_uuid, *args, **kwargs)
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp File
>>> "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 904, in
>>> reboot_instance
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp reboot_type)
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp File
>>> "/usr/lib/python2.7/dist-packages/nova/exception.py", line 114, in wrapped
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp return f(*args, **kw)
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp File
>>> "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/connection.py", line
>>> 721, in reboot
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp if
>>> self._soft_reboot(instance):
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp File
>>> "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/connection.py", line
>>> 757, in _soft_reboot
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp dom.create()
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp File
>>> "/usr/lib/python2.7/dist-packages/libvirt.py", line 551, in create
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp if ret == -1: raise
>>> libvirtError ('virDomainCreate() failed', dom=self)
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp libvirtError: Cannot get
>>> interface MTU on 'br14': No such device
>>> 2013-07-07 21:17:34 TRACE nova.rpc.amqp
>>>
>>>
>>> So I tried starting it manually:
>>>
>>> root at compute-01:/etc/libvirt/qemu# virsh create instance-00000035.xml
>>> error: Failed to create domain from instance-00000035.xml
>>> error: Cannot get interface MTU on 'br14': No such device
>>>
>>>
>>> Any idea what I might be doing wrong? All the services show :-) with
>>> nova-manage
>>>
>>>
>>> Thanks for your help...
>>>
>>> Sam
>>>
>>> _______________________________________________
>>> OpenStack-operators mailing list
>>> OpenStack-operators at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>
>>>
>>
>>
>> --
>> Lorin Hochstein
>> Lead Architect - Cloud Services
>> Nimbis Services, Inc.
>> www.nimbisservices.com
>>
>
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20130707/add6b665/attachment.html>
More information about the OpenStack-operators
mailing list