[Openstack-operators] Migrating instances in grizzly

Aubrey Wells aubrey at vocalcloud.com
Thu Sep 5 16:00:09 UTC 2013


We had something similar happen and it turned out it was stale tokens in
the keystone database making the nova commands take ages to run and since
live migration is time sensitive, it would time out. Set up a script to
keep the tokens table in the keystone database cleaned up and haven't had
the problem since. It also caused other issues like nova list and nova boot
taking minutes to return, so it may or not be your issue if that is the
only thing you're seeing.

------------------
Aubrey Wells
Director | Network Services
VocalCloud
888.305.3850
support at vocalcloud.com
www.vocalcloud.com


On Mon, Sep 2, 2013 at 8:44 PM, Juan José Pavlik Salles
<jjpavlik at gmail.com>wrote:

> I've also found this in nova-conductor.log:
>
> 2013-09-02 15:35:27.208 DEBUG nova.openstack.common.rpc.common
> [req-e0473533-89af-4ff5-b6fa-4b0b6eb50a6d 31020076174943bdb7486c330a298d93
> d1e3aae242f14c488d2225dc
> bf1e96d6] Timed out waiting for RPC response: timed out _error_callback
> /usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/impl_kombu.py:628
> 2013-09-02 15:35:27.222 ERROR nova.openstack.common.rpc.amqp
> [req-e0473533-89af-4ff5-b6fa-4b0b6eb50a6d 31020076174943bdb7486c330a298d93
> d1e3aae242f14c488d2225dcbf
> 1e96d6] Exception during message handling
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp
> Traceback (most recent call last):
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp   File
> "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py", line
> 430, in _proce
> ss_data
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp     rval
> = self.proxy.dispatch(ctxt, version, method, **args)
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp   File
> "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/dispatcher.py",
> line 133, in
> dispatch
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp
> return getattr(proxyobj, method)(ctxt, **kwargs)
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp   File
> "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 399, in
> network_migrat
> e_instance_start
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp
> self.network_api.migrate_instance_start(context, instance, migration)
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp   File
> "/usr/lib/python2.7/dist-packages/nova/network/api.py", line 89, in wrapped
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp
> return func(self, context, *args, **kwargs)
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp   File
> "/usr/lib/python2.7/dist-packages/nova/network/api.py", line 501, in
> migrate_instance_sta
> rt
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp
> self.network_rpcapi.migrate_instance_start(context, **args)
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp   File
> "/usr/lib/python2.7/dist-packages/nova/network/rpcapi.py", line 333, in
> migrate_instance_
> start
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp
> version='1.2')
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp   File
> "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/proxy.py", line
> 80, in call
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp
> return rpc.call(context, self._get_topic(topic), msg, timeout)
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp   File
> "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/__init__.py",
> line 140, in ca
> ll
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp
> return _get_impl().call(CONF, context, topic, msg, timeout)
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp   File
> "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/impl_kombu.py",
> line 798, in
> call
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp
> rpc_amqp.get_connection_pool(conf, Connection))
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp   File
> "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py", line
> 612, in call
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp     rv =
> list(rv)
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp   File
> "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py", line
> 554, in __iter__
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp
> self.done()
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp   File
> "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp
> self.gen.next()
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp   File
> "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py", line
> 551, in __iter__
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp
> self._iterator.next()
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp   File
> "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/impl_kombu.py",
> line 648, in iterconsume
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp
> yield self.ensure(_error_callback, _consume)
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp   File
> "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/impl_kombu.py",
> line 566, in ensure
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp
> error_callback(e)
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp   File
> "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/impl_kombu.py",
> line 629, in _error_callback
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp
> raise rpc_common.Timeout()
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp Timeout:
> Timeout while waiting on RPC response.
> 2013-09-02 15:35:27.222 1363 TRACE nova.openstack.common.rpc.amqp
> 2013-09-02 15:35:27.237 ERROR nova.openstack.common.rpc.common
> [req-e0473533-89af-4ff5-b6fa-4b0b6eb50a6d 31020076174943bdb7486c330a298d93
> d1e3aae242f14c488d2225dcbf1e96d6] Returning exception Timeout while waiting
> on RPC response. to caller
>
> Does anybody know all the steps that take to live-migrate an instance ??
> It seems to be stopping inside the network_migrate_instance_start function,
> really no clue at all...
>
>
> 2013/9/2 Juan José Pavlik Salles <jjpavlik at gmail.com>
>
>> Hi guys, last friday i started testing live-migration in my grizzly cloud
>> with shared storage (gfs2) but i run into a problem, a little weird:
>>
>> This is the status before migrating:
>>
>> -I've p9 instances also called instance-00000022 running on "acelga"
>> compute node.
>>
>> *root at acelga:~/tools# virsh list*
>> * Id    Name                           State*
>> *----------------------------------------------------*
>> * 6     instance-00000022              running*
>> *
>> *
>> *root at acelga:~/tools# *
>> *
>> *
>> *
>> *
>> *root at cebolla:~/tool# virsh list*
>> * Id    Nombre                         Estado*
>> *----------------------------------------------------*
>> *
>> *
>> *root at cebolla:~/tool# *
>>
>> -Here you can see all the info about the instance
>>
>> *root at cebolla:~/tool# nova --os-username=noc-admin --os-tenant-name=noc
>> --os-password=XXXXXXX --os-auth-url http://172.19.136.1:35357/v2.0 show
>> de2bcbed-f7b6-40cd-89ca-acf6fe2f2d09*
>> *
>> +-------------------------------------+-----------------------------------------------------------+
>> *
>> *| Property                            | Value
>>                           |*
>> *
>> +-------------------------------------+-----------------------------------------------------------+
>> *
>> *| status                              | ACTIVE
>>                            |*
>> *| updated                             | 2013-09-02T15:27:39Z
>>                            |*
>> *| OS-EXT-STS:task_state               | None
>>                            |*
>> *| OS-EXT-SRV-ATTR:host                | acelga
>>                            |*
>> *| key_name                            | None
>>                            |*
>> *| image                               | Ubuntu 12.04.2 LTS
>> (1359ca8d-23a2-40e8-940f-d90b3e68bb39) |*
>> *| vlan1 network                       | 172.16.16.175
>>                           |*
>> *| hostId                              |
>> 81be94870821e17e327d92e9c80548ffcdd37d24054a235116669f53  |*
>> *| OS-EXT-STS:vm_state                 | active
>>                            |*
>> *| OS-EXT-SRV-ATTR:instance_name       | instance-00000022
>>                           |*
>> *| OS-EXT-SRV-ATTR:hypervisor_hostname | acelga.psi.unc.edu.ar
>>                           |*
>> *| flavor                              | m1.tiny (1)
>>                           |*
>> *| id                                  |
>> de2bcbed-f7b6-40cd-89ca-acf6fe2f2d09                      |*
>> *| security_groups                     | [{u'name': u'default'}]
>>                           |*
>> *| user_id                             |
>> 20390b639d4449c18926dca5e038ec5e                          |*
>> *| name                                | p9
>>                            |*
>> *| created                             | 2013-09-02T15:27:06Z
>>                            |*
>> *| tenant_id                           |
>> d1e3aae242f14c488d2225dcbf1e96d6                          |*
>> *| OS-DCF:diskConfig                   | MANUAL
>>                            |*
>> *| metadata                            | {}
>>                            |*
>> *| accessIPv4                          |
>>                           |*
>> *| accessIPv6                          |
>>                           |*
>> *| progress                            | 0
>>                           |*
>> *| OS-EXT-STS:power_state              | 1
>>                           |*
>> *| OS-EXT-AZ:availability_zone         | nova
>>                            |*
>> *| config_drive                        |
>>                           |*
>> *
>> +-------------------------------------+-----------------------------------------------------------+
>> *
>> *root at cebolla:~/tool#*
>>
>> -So i try to move it to the other node "cebolla"
>>
>> *root at acelga:~/tools# nova --os-username=noc-admin --os-tenant-name=noc
>> --os-password=HjZ5V9yj --os-auth-url http://172.19.136.1:35357/v2.0live-migration de2bcbed-f7b6-40cd-89ca-acf6fe2f2d09 cebolla
>> *
>> *root at acelga:~/tools# virsh list*
>> * Id    Name                           State*
>> *----------------------------------------------------*
>> *
>> *
>> *root at acelga:~/tools#*
>>
>> No error messages at all on "acelga" compute node so far. If i check the
>> other node i can see the instance've been migrated
>>
>> *root at cebolla:~/tool# virsh list*
>> * Id    Nombre                         Estado*
>> *----------------------------------------------------*
>> * 11    instance-00000022              ejecutando*
>> *
>> *
>> *root at cebolla:~/tool#*
>>
>>
>> -BUT... after a few seconds i get this on "acelga"'s nova-compute.log
>>
>>
>> *2013-09-02 15:35:45.784 4601 DEBUG nova.openstack.common.rpc.common [-]
>> Timed out waiting for RPC response: timed out _error_callback
>> /usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/impl_kombu.py:628
>> *
>> *2013-09-02 15:35:45.790 4601 ERROR nova.utils [-] in fixed duration
>> looping call*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils Traceback (most recent
>> call last):*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils   File
>> "/usr/lib/python2.7/dist-packages/nova/utils.py", line 594, in _inner*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils     self.f(*self.args, **
>> self.kw)*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils   File
>> "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 3129,
>> in wait_for_live_migration*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils     migrate_data)*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils   File
>> "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 3208, in
>> _post_live_migration*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils     migration)*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils   File
>> "/usr/lib/python2.7/dist-packages/nova/conductor/api.py", line 664, in
>> network_migrate_instance_start*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils     migration)*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils   File
>> "/usr/lib/python2.7/dist-packages/nova/conductor/rpcapi.py", line 415, in
>> network_migrate_instance_start*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils     return
>> self.call(context, msg, version='1.41')*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils   File
>> "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/proxy.py", line
>> 80, in call*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils     return
>> rpc.call(context, self._get_topic(topic), msg, timeout)*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils   File
>> "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/__init__.py",
>> line 140, in call*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils     return
>> _get_impl().call(CONF, context, topic, msg, timeout)*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils   File
>> "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/impl_kombu.py",
>> line 798, in call*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils
>> rpc_amqp.get_connection_pool(conf, Connection))*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils   File
>> "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py", line
>> 612, in call*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils     rv = list(rv)*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils   File
>> "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py", line
>> 554, in __iter__*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils     self.done()*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils   File
>> "/usr/lib/python2.7/contextlib.py", line 24, in __exit__*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils     self.gen.next()*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils   File
>> "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py", line
>> 551, in __iter__*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils     self._iterator.next()*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils   File
>> "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/impl_kombu.py",
>> line 648, in iterconsume*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils     yield
>> self.ensure(_error_callback, _consume)*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils   File
>> "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/impl_kombu.py",
>> line 566, in ensure*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils     error_callback(e)*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils   File
>> "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/impl_kombu.py",
>> line 629, in _error_callback*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils     raise
>> rpc_common.Timeout()*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils Timeout: Timeout while
>> waiting on RPC response.*
>> *2013-09-02 15:35:45.790 4601 TRACE nova.utils*
>>
>>
>> -And the VM state never changes back to ACTIVE from MIGRATING:
>>
>>
>> *root at cebolla:~/tool# nova --os-username=noc-admin --os-tenant-name=noc
>> --os-password=XXXXX --os-auth-url http://172.19.136.1:35357/v2.0 show
>> de2bcbed-f7b6-40cd-89ca-acf6fe2f2d09*
>> *
>> +-------------------------------------+-----------------------------------------------------------+
>> *
>> *| Property                            | Value
>>                           |*
>> *
>> +-------------------------------------+-----------------------------------------------------------+
>> *
>> *| status                              | MIGRATING
>>                           |*
>> *| updated                             | 2013-09-02T15:33:54Z
>>                            |*
>> *| OS-EXT-STS:task_state               | migrating
>>                           |*
>> *| OS-EXT-SRV-ATTR:host                | acelga
>>                            |*
>> *| key_name                            | None
>>                            |*
>> *| image                               | Ubuntu 12.04.2 LTS
>> (1359ca8d-23a2-40e8-940f-d90b3e68bb39) |*
>> *| vlan1 network                       | 172.16.16.175
>>                           |*
>> *| hostId                              |
>> 81be94870821e17e327d92e9c80548ffcdd37d24054a235116669f53  |*
>> *| OS-EXT-STS:vm_state                 | active
>>                            |*
>> *| OS-EXT-SRV-ATTR:instance_name       | instance-00000022
>>                           |*
>> *| OS-EXT-SRV-ATTR:hypervisor_hostname | acelga.psi.unc.edu.ar
>>                           |*
>> *| flavor                              | m1.tiny (1)
>>                           |*
>> *| id                                  |
>> de2bcbed-f7b6-40cd-89ca-acf6fe2f2d09                      |*
>> *| security_groups                     | [{u'name': u'default'}]
>>                           |*
>> *| user_id                             |
>> 20390b639d4449c18926dca5e038ec5e                          |*
>> *| name                                | p9
>>                            |*
>> *| created                             | 2013-09-02T15:27:06Z
>>                            |*
>> *| tenant_id                           |
>> d1e3aae242f14c488d2225dcbf1e96d6                          |*
>> *| OS-DCF:diskConfig                   | MANUAL
>>                            |*
>> *| metadata                            | {}
>>                            |*
>> *| accessIPv4                          |
>>                           |*
>> *| accessIPv6                          |
>>                           |*
>> *| OS-EXT-STS:power_state              | 1
>>                           |*
>> *| OS-EXT-AZ:availability_zone         | nova
>>                            |*
>> *| config_drive                        |
>>                           |*
>> *
>> +-------------------------------------+-----------------------------------------------------------+
>> *
>> *root at cebolla:~/tool#*
>>
>>
>> Funny fact:
>> -The vm still answer ping after migration, so i think this is good.
>>
>> Any ideas about this problem? At first i thought it could be related to a
>> connection problem between the nodes, but the VM migrates completly in
>> hipervisor level somehow there is some "instance've been migrated ACK"
>> missing.
>>
>>
>> --
>> Pavlik Salles Juan José
>>
>
>
>
> --
> Pavlik Salles Juan José
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20130905/a8aebc48/attachment.html>


More information about the OpenStack-operators mailing list